What is computer vision?
Computer vision is a technology that machines use to automatically recognize images and describe them accurately and efficiently. Today, computer systems have access to a large volume of images and video data sourced from or created by smartphones, traffic cameras, security systems, and other devices. Computer vision applications use artificial intelligence and machine learning (AI/ML) to process this data accurately for object identification and facial recognition, as well as classification, recommendation, monitoring, and detection.
Why is computer vision important?
While visual information processing technology has existed for some time, much of the process required human intervention and was time consuming and error prone. For example, implementing a facial recognition system in the past required developers to manually tag thousands of images with key data points, such as the width of the nose bridge and the distance between the eyes. Automating these tasks required extensive computing power because image data is unstructured and complex for computers to organize. Vision applications were thus expensive and inaccessible to most organizations.
Today, progress in the field combined with a considerable increase in computational power has improved both the scale and accuracy of image data processing. Computer vision systems powered by cloud computing resources are now accessible to everyone. Any organization can use the technology for identity verification, content moderation, streaming video analysis, fault detection, and more.
Use cases
How does computer vision work?
Computer vision systems use artificial intelligence (AI) technology to mimic the capabilities of the human brain that are responsible for object recognition and object classification. Computer scientists train computers to recognize visual data by inputting vast amounts of information. Machine learning (ML) algorithms identify common patterns in these images or videos and apply that knowledge to identify unknown images accurately. For example, if computers process millions of images of cars, they will begin to build up identity patterns that can accurately detect a vehicle in an image. Computer vision uses technologies such as those given below.
Deep learning
Deep learning is a type of ML that uses neural networks. Deep learning neural networks are made of many layers of software modules called artificial neurons that work together inside the computer. They use mathematical calculations to automatically process different aspects of image data and gradually develop a combined understanding of the image.
Convolutional neural networks
Convolutional neural networks (CNNs) utilize a labeling system to categorize visual data and comprehend the whole image. They analyze images as pixels and give each pixel a label value. The value is inputted to perform a mathematical operation called convolution and make predictions about the picture. Like a human attempting to recognize an object at a distance, a CNN first identifies outlines and simple shapes before filling in additional details like color, internal forms, and texture. Finally, it repeats the prediction process over several iterations to improve accuracy.
Recurrent neural networks
Recurrent neural networks (RNNs) are similar to CNNs, but can process a series of images to find links between them. While CNNs are used for single image analysis, RNNs can analyze videos and understand the relationships between images.
What is the difference between computer vision and image processing?
Image processing uses algorithms to alter images, including sharpening, smoothing, filtering, or enhancing. Computer vision is different as it doesn't change an image, but instead makes sense of what it sees and carries out a task, such as labeling. In some cases, you can use image processing to modify an image so a computer vision system can better understand it. In other cases you use computer vision to identify images or parts of an image and then use image processing to modify the image further.
What are common tasks that computer vision can perform?
Image classification
Image classification enables computers to see an image and accurately classify which class it falls under. Computer vision understands classes and labels them, for instance trees, planes, or buildings. One example is that a camera can recognize faces in a photograph and focus on them.
Object detection
Object detection is a computer vision task for detecting and localizing images. It uses classification to identify, sort, and organize images. Object detection is used in industrial and manufacturing processes to control autonomous applications and monitor production lines. Connected home camera manufacturers and service providers also rely on object detection to process live video streams from cameras to detect people and objects in real-time and provide actionable alerts to their end users.
Object tracking
Object tracking uses deep learning models to identify and track items belonging to categories. It has several real-world applications across multiple industries. The first element of object tracking is object detection; the object has a bounding box created around it, is given an object ID, and can be tracked through frames. For example, object tracking can be used for traffic monitoring in urban environments, human surveillance, and medical imaging.
Segmentation
Segmentation is a computer vision algorithm that identifies an object by dividing images of it into different regions based on the pixels seen. Segmentation also simplifies an image, such as placing a shape or outline of an item to determine what it is. By doing so, segmentation also recognizes if there is more than one object in an image or frame.
For example, if there is a cat and a dog in an image, segmentation can be used to recognize the two animals. Unlike object detection, which builds a box around an object, segmentation tracks pixels to determine the shape of an object, making it easier to analyze and label.
Content-based image retrieval
Content-based image retrieval is an application of computer vision techniques that can search for specific digital images in large databases. It analyzes metadata like tags, descriptions, labels, and keywords. Semantic retrieval uses commands such as 'find pictures of buildings' to retrieve appropriate content.
How does AWS help with your computer vision tasks?
AWS provides the broadest and most complete set of artificial intelligence and machine learning (AI/ML) services connected to a comprehensive set of data sources for customers of all levels of expertise.
For customers building on frameworks and managing their own infrastructure, we optimize versions of the most popular deep learning frameworks, including PyTorch, MXNet, and TensorFlow. AWS provides a broad and deep portfolio of compute, networking, and storage infrastructure ML services with a choice of processors and accelerators to meet unique performance and budget needs.
For customers who want to create a standard computer vision solution across their business, Amazon SageMaker makes it easy to prepare data and build, train, and deploy ML models for any use case with fully managed infrastructure, tools, and workflows, including no-code offerings for business analysts.
For customers that lack ML skills, need faster time-to-market, or want to add intelligence to an existing process or an application, AWS offers a range of ML-based computer vision services. These services allow you to easily add intelligence to your AI applications through pre-trained APIs. Amazon Rekognition automates your image and video analysis with ML and analyzes millions of images, live streams, and stored videos in seconds.
Get started with computer vision by creating a free AWS account today.