Amazon Computer Vision

What is computer vision?

Computer vision is a technology that machines use to automatically recognize images and describe them accurately and efficiently. Today, computer systems have access to a large volume of images and video data sourced from or created by smartphones, traffic cameras, security systems, and other devices. Computer vision applications use artificial intelligence and machine learning (AI/ML) to process this data accurately for object identification and facial recognition, as well as classification, recommendation, monitoring, and detection.

Why is computer vision important?

While visual information processing technology has existed for some time, much of the process required human intervention and was time consuming and error prone. For example, implementing a facial recognition system in the past required developers to manually tag thousands of images with key data points, such as the width of the nose bridge and the distance between the eyes. Automating these tasks required extensive computing power because image data is unstructured and complex for computers to organize. Vision applications were thus expensive and inaccessible to most organizations.

Today, progress in the field combined with a considerable increase in computational power has improved both the scale and accuracy of image data processing. Computer vision systems powered by cloud computing resources are now accessible to everyone. Any organization can use the technology for identity verification, content moderation, streaming video analysis, fault detection, and more.

Use cases

Governments and enterprises use computer vision to improve the security of assets, sites, and facilities. For example, cameras and sensors monitor public spaces, industrial sites, and high-security environments. They send automatic alerts if something out of the ordinary occurs, such as an unauthorized individual entering a restricted area.

Similarly, computer vision can improve personal safety at home as well as in the workplace. For example, recognition technology can monitor myriad safety-related issues. These include at-home real-time streams detecting pets, or live front-door cameras detecting visitors or packages delivered. In the workplace, such monitoring includes wearing of appropriate personal protective equipment by workers, informing warning systems, or generating reports.

Computer vision can analyze images and extract metadata for business intelligence, creating new revenue opportunities and operational efficiencies. For example, it can:

  • Automatically identify quality defects before products leave the factory
  • Detect machine maintenance and safety issues
  • Analyze social media images to discover trends and patterns in customer behavior
  • Authenticate employees with automatic facial recognition

Autonomous vehicle technology uses computer vision to recognize real-time images and build 3D maps from multiple cameras fitted to autonomous transport. It can analyze images and identify other road users, road signs, pedestrians, or obstacles.

In semiautonomous vehicles, computer vision uses machine learning (ML) to monitor driver behavior. For example, it looks for signs of distraction, fatigue, and drowsiness based on the driver's head position, eye tracking, and upper body movement. If the technology picks up on certain warning signs, it alerts the driver and reduces the chance of a driving incident.

From boosting productivity to reducing costs with intelligent automation, computer vision applications enhance the overall functioning of the agricultural sector. Satellite imaging as well as UAV footage help to analyze vast tracts of land and improve farming practices. Computer vision applications automate tasks like monitoring field conditions, identifying crop disease, checking soil moisture, and predicting weather and crop yields. Animal monitoring with computer vision is another key strategy of smart farmiing.

Healthcare is one of the leading industries applying computer vision technology. Notably, medical image analysis creates a visualization of organs and tissues to help medical professionals make speedy and accurate diagnoses, resulting in better treatment outcomes and life expectancy. For example:

  • Tumor detection by analyzing moles and skin lesions
  • Automatic X-ray analysis
  • Symptom discovery from MRI scans

How does computer vision work?

Computer vision systems use artificial intelligence (AI) technology to mimic the capabilities of the human brain that are responsible for object recognition and object classification. Computer scientists train computers to recognize visual data by inputting vast amounts of information. Machine learning (ML) algorithms identify common patterns in these images or videos and apply that knowledge to identify unknown images accurately. For example, if computers process millions of images of cars, they will begin to build up identity patterns that can accurately detect a vehicle in an image. Computer vision uses technologies such as those given below.

Deep learning

Deep learning is a type of ML that uses neural networks. Deep learning neural networks are made of many layers of software modules called artificial neurons that work together inside the computer. They use mathematical calculations to automatically process different aspects of image data and gradually develop a combined understanding of the image.

Convolutional neural networks

Convolutional neural networks (CNNs) utilize a labeling system to categorize visual data and comprehend the whole image. They analyze images as pixels and give each pixel a label value. The value is inputted to perform a mathematical operation called convolution and make predictions about the picture. Like a human attempting to recognize an object at a distance, a CNN first identifies outlines and simple shapes before filling in additional details like color, internal forms, and texture. Finally, it repeats the prediction process over several iterations to improve accuracy.

Recurrent neural networks

Recurrent neural networks (RNNs) are similar to CNNs, but can process a series of images to find links between them. While CNNs are used for single image analysis, RNNs can analyze videos and understand the relationships between images. 

What is the difference between computer vision and image processing?

Image processing uses algorithms to alter images, including sharpening, smoothing, filtering, or enhancing. Computer vision is different as it doesn't change an image, but instead makes sense of what it sees and carries out a task, such as labeling. In some cases, you can use image processing to modify an image so a computer vision system can better understand it. In other cases you use computer vision to identify images or parts of an image and then use image processing to modify the image further.

What are common tasks that computer vision can perform?

Image classification

Image classification enables computers to see an image and accurately classify which class it falls under. Computer vision understands classes and labels them, for instance trees, planes, or buildings. One example is that a camera can recognize faces in a photograph and focus on them.

Object detection

Object detection is a computer vision task for detecting and localizing images. It uses classification to identify, sort, and organize images. Object detection is used in industrial and manufacturing processes to control autonomous applications and monitor production lines. Connected home camera manufacturers and service providers also rely on object detection to process live video streams from cameras to detect people and objects in real-time and provide actionable alerts to their end users.

Object tracking

Object tracking uses deep learning models to identify and track items belonging to categories. It has several real-world applications across multiple industries. The first element of object tracking is object detection; the object has a bounding box created around it, is given an object ID, and can be tracked through frames. For example, object tracking can be used for traffic monitoring in urban environments, human surveillance, and medical imaging.

Segmentation

Segmentation is a computer vision algorithm that identifies an object by dividing images of it into different regions based on the pixels seen. Segmentation also simplifies an image, such as placing a shape or outline of an item to determine what it is. By doing so, segmentation also recognizes if there is more than one object in an image or frame.

For example, if there is a cat and a dog in an image, segmentation can be used to recognize the two animals. Unlike object detection, which builds a box around an object, segmentation tracks pixels to determine the shape of an object, making it easier to analyze and label.

Content-based image retrieval

Content-based image retrieval is an application of computer vision techniques that can search for specific digital images in large databases. It analyzes metadata like tags, descriptions, labels, and keywords. Semantic retrieval uses commands such as 'find pictures of buildings' to retrieve appropriate content.

How does AWS help with your computer vision tasks?

AWS provides the broadest and most complete set of artificial intelligence and machine learning (AI/ML) services connected to a comprehensive set of data sources for customers of all levels of expertise.

For customers building on frameworks and managing their own infrastructure, we optimize versions of the most popular deep learning frameworks, including PyTorchMXNet, and TensorFlow. AWS provides a broad and deep portfolio of compute, networking, and storage infrastructure ML services with a choice of processors and accelerators to meet unique performance and budget needs.

For customers who want to create a standard computer vision solution across their business, Amazon SageMaker makes it easy to prepare data and build, train, and deploy ML models for any use case with fully managed infrastructure, tools, and workflows, including no-code offerings for business analysts.

For customers that lack ML skills, need faster time-to-market, or want to add intelligence to an existing process or an application, AWS offers a range of ML-based computer vision services. These services allow you to easily add intelligence to your AI applications through pre-trained APIs. Amazon Rekognition automates your image and video analysis with ML and analyzes millions of images, live streams, and stored videos in seconds.

Get started with computer vision by creating a free AWS account today.