Amazon Web Services (AWS) offers powerful and cost-effective services to help customers process, analyze and extract meaningful data from their video files. Customers who want to obtain a broader understanding of their video libraries can use trusted AWS services to develop solutions that quickly and seamlessly analyze frames in their video files. Developing these solutions can sometimes require extensive knowledge of deep-learning algorithms, depending on the level of analysis required.

This AWS solution helps customers detect and extract video frames using machine learning algorithms to identify facial, object, and scene-level metadata in those frames, and stores the resulting metadata for future search and analysis. It also provides an image-based search feature that allows customers to search the collected video metadata for faces by supplying an image.

This webpage provides best practices and guidance to consider when extracting metadata from videos, and introduces an AWS solution that combines Amazon Rekognition with the open source tool FFmpeg, to extract, store, and search frame-level metadata from videos.  

When analyzing videos in the cloud, there are some universal best practices to follow that will help you build effective video analysis solutions. Consider the following best practices as part of any video processing solution:  

  • Leverage video and image analysis services to significantly reduce the time and effort required to process and extract meaningful information from your videos. Services like Amazon Rekognition provide deep image analysis without the need to develop extensive knowledge in deep-learning algorithms.
  • Determine the sampling frequency that makes the most sense for your use case or desired metadata. For example, this solution analyzes images every second.
  • Choose a solution that collects metadata that matches your use case.
  • Clearly define an end-to-end workflow for processing and analyzing your videos, with a well-understood design for redundancy and failure.

This solution combines trusted AWS services with the open source technology FFmpeg, a fast video converter, to identify and extract frame-level metadata from video files. Extracted video frames are also sent to Amazon Rekognition for additional image analysis and metadata extraction. This allows customers to easily enable their apps to seamlessly analyze videos, and extract meaningful information on any platform such as mobile, website and desktop. It also provides an image-based search feature that allows customers to search the collected video metadata for faces by supplying an image.  


1. When a video is uploaded to the Amazon S3 bucket, the solution triggers the video-processing component, and an Amazon Simple Queue Service (Amazon SQS) message is created and added to Amazon SQS queue.

2. Amazon Elastic Compute Cloud (Amazon EC2) instances monitor the Amazon SQS queue, download new files, and trigger FFmpeg to extract video frames at one frame per second. A Lambda function posts the frame processing status in an AWS IoT topic. The instances send the frames back to the S3 bucket, and store frame image information in an Amazon DynamoDB table.

3. After writing to the Amazon DynamoDB table, an AWS Lambda function is triggered, that calls Amazon Rekognition to identify metadata for each frame, and saves the metadata in the Amazon DynamoDB table. The function also creates and manages face collections. The results of the image processing are posted in an AWS IoT topic.

4. Another AWS Lambda function creates a list of tags and labels, which are stored in Amazon DynamoDB, Amazon S3, and sent to an Amazon SNS topic, to alert subscribers of the new tags.

5. When an image is uploaded in the photo search Amazon S3 bucket, the solution triggers the face-search component. An AWS Lambda function initiates a search to find matching faces in the previously extracted video metadata.

6. Another AWS Lambda function retrieves the collections from the Amazon DynamoDB table, and searches the Amazon Rekognition collections for the image. The results are stored in an Amazon DynamoDB table, and the results status of the search are sent to an AWS IoT Topic.

Deploy Solution
Implementation Guide

What you'll accomplish:

Deploy the Frame-based Analysis for Your Videos solution using AWS CloudFormation. The AWS CloudFormation templates automatically launch and configure the necessary components.

           Extract frames from your video files, and store the resulting metadata for future search and analysis.

What you'll need before starting:

An AWS account: You will need an AWS account to begin provisioning resources. Sign up for AWS.

Skill level: This solution is intended for IT infrastructure and security professionals who have practical experience working with web application and architecting on the AWS cloud.

Q: When using the face-search feature, how accurate is the match?

The match threshold is currently set at 85%. If the face matches a facial frame in an Amazon Rekognition collection, the results will be shown in an Amazon DynamoDB table.

Q: What file types are supported by this solution?

Currently, this solution supports MP4 videos and JPG images. See the implementation guide for details.

Q: How do I search my videos by tag?

Video frame tags can be searched by querying the RVA_VIDEOS_LABELS_TABLE in Amazon DynamoDB, or by creating a custom user interface for interacting with this video frame processing backend. Additionally, you can store the tags in Amazon Elasticsearch for future query or analysis.

Q: Can I deploy the solution in any AWS Region?

The Frame-based Analysis for Your Videos solution must be deployed in an AWS Region where Amazon Rekognition is currently available. The deployed Amazon S3 buckets will be deployed in the AWS Region where the solution is launched.

Need more resources to get started with AWS? Visit the Getting Started Resource Center to find tutorials, projects and videos to get started with AWS.

Tell us what you think