Amazon Rekognition Video for Media Analysis

Streamline quality control, ad insertion, and content production using machine learning.

Viewers are watching more content than ever, with Over-The-Top (OTT) and Video-On-Demand (VOD) platforms in particular providing a rich selection of content choices anytime, anywhere, and on any screen. With proliferating content volumes, media companies are facing challenges in preparing and managing content, which are crucial to providing a high-quality viewing experience and better monetizing content. Today, companies use large teams of trained human workforces to perform tasks such as finding where the end credits begin in a piece of content, choosing the right spots to insert ads, or breaking up videos into smaller clips for better indexing. These manual processes are expensive, slow, and cannot scale to keep up with the volume of content being produced, licensed, and retrieved from archives daily.

Amazon Rekognition Video makes it easy to automate these operational media analysis tasks by providing fully managed, purpose-built APIs powered by ML. Using these APIs, you can easily analyze large volumes of videos stored in Amazon S3, detect markers such as black frames or shot changes, and get SMPTE (Society of Motion Picture and Television Engineers) timecodes and timestamps for each detection - without requiring any machine learning experience. Returned SMPTE timecodes are frame accurate, which means that Amazon Rekognition Video provides the exact frame number when it detects a relevant segment of video, and handles various video frame rate formats under the hood. Using the frame accurate metadata from Amazon Rekognition Video, you can either automate certain tasks completely or significantly reduce the review workload of trained human operators so that they can focus on more creative work. This enables you to perform tasks such as content preparation, ad insertion, and adding ‘binge-markers’ to content at scale in the cloud.

With Amazon Rekognition Video, you pay only for what you use. There are no minimum fees, licenses, or upfront commitments.

AWS What's Next ft. Amazon Rekognition Video for media analysis (38:23)


Streamline operational tasks

With the rich metadata returned by Amazon Rekognition Video, you can easily scale and automate manual operational tasks such as preparing content for VOD, inserting ads and creating ‘binge-friendly’ prompts such as skipping to the next episode when end credits start rolling. This saves costs and allows human workforces to focus on higher value tasks.

Get frame accurate results

Amazon Rekognition Video’s media analysis features provide frame accurate detection results along with SMPTE timecodes. This means that you get the exact frame number when Amazon Rekognition Video detects a specific type of video segment such as end credits. Further, this service automatically handles integer, fractional and drop frame rate formats.

Reduce costs

Amazon Rekognition Video enables you to create reliable and easy to use media operations workflows in the cloud without upfront commitments or expensive licenses for on-premise software. You simply pay based on the duration of video that is processed and the features you use.


Black frames detection

Videos often contain a short duration of empty black frames with no audio, which are used as cues to insert advertisements or to demarcate the end of a program segment such as a scene or the opening credits. With Amazon Rekognition Video, you can detect such black frame sequences to automate ad insertion, package content for VOD, and demarcate various program segments or scenes. Black frames with audio (such as fade outs or voiceovers) are considered as content and not returned.

Credits detection

Amazon Rekognition Video helps you automatically identify the exact frames where the opening and closing credits start and end for a movie or TV show. With this information, you can generate ‘binge markers’ and interactive viewer prompts such as ‘Next Episode’ or ‘Skip Intro’ in VOD applications. Amazon Rekognition Video is trained to handle a wide variety of opening and end credit styles ranging from simple rolling credits to more challenging credits alongside content, credits on scenes, or stylized credits in anime content.

Shot detection

A shot is a series of interrelated consecutive pictures taken contiguously by a single camera and representing a continuous action in time and space. With Amazon Rekognition Video, you can detect the start, end, and duration of each shot, as well as a count all the shots in a piece of content. Shot metadata can be used for applications such as creating promotional videos using selected shots, generating a set of preview thumbnails that avoid transitional content between shots, and inserting ads in spots that don’t disrupt viewer experience, such as the middle of a shot when someone is speaking.

Color bars detection

Amazon Rekognition Video allows you to detect sections of video that display SMPTE color bars, which are a set of colors displayed in specific patterns to ensure color is calibrated correctly on broadcast monitors, programs, and on cameras. This metadata is useful to prepare content for VOD applications by removing color bar segments from the content or to detect issues such as loss of broadcast signals in a recording when color bars are shown continuously as a default signal instead of content.

Slates & studio logos detection

Slates are sections, typically at the beginning of a video, that contain text metadata about the episode, studio, video format, audio channels, and more. Amazon Rekognition can identify the start and end such slates, making it easy for operators to use the text metadata or to simply remove the slate when preparing content for final viewing. Studio logos are sequences that show the logos or emblems of the production studio involved in making the show. Amazon Rekognition can identify such sequences, making it easy for operators to review them for identifying studios.

Content detection

Content refers to the portions of the TV show or movie that contain the program or related elements. Black frames, credits, color bars, slates, and studio logos are not considered to be content. Amazon Rekognition Video enables you to detect the start and end of each content segment in the video, which enables multiple uses such as finding the program run time or finding certain segments that serve specific purposes. For example, a quick recap of the previous episode at the beginning of the video is a type of content. Similarly, bonus post-credit content can appear after the credits have finished. And, some videos may have ‘textless’ content at the end of the video, which are a set of all program content that contains overlaid text, but with that text removed to enable internationalization in another language. Once all the content segments are detected with Amazon Rekognition Video, you can apply specific domain knowledge such as ‘my videos always start with a recap’ to further categorize each segment or to send them for human review.

Use Cases

Automated ad insertion

You can use Amazon Rekognition Video to detect timecodes of ad insertion markers (a series of black frames with silence) or suitable ad insertion spots (at shot change boundaries). With this metadata, you can then use services like AWS Elemental MediaTailor to stich ads seamlessly into your content.

Content preparation for VOD

You can leverage Amazon Rekognition Video to prepare archived and third-party content for VOD workflows. By detecting the SMPTE color bars and the beginning of end credits, you can clean up programs for streaming or add interactive user prompts such as ‘Next Episode’ when the end credits start rolling.

Content production

In the production of movies, shows and promotional videos, editors work with large volumes of footage. Using Amazon Rekognition Video, you can break down this source content into its constituent shots, making it easy to choose the best clips for your final edited version.


A+E Networks

A+E Networks® is a collection of culture brands that includes A&E®, HISTORY®, Lifetime®, LMN™, FYI™, Vice TV and BIOGRAPHY®. We are in seven out of 10 American homes, cumulatively reach 335 million people worldwide and have 500+ million digital users.

“A+E Networks receives thousands of hours of new programing each year, with each file going through dozens of automated workflows to get to the right people at the right time. This automation is often hampered, however, by a key challenge – identifying where each segment within the file begins or ends. Our technicians must first view the video file and then manually enter every timecode to enable automated processes like transcode and quality control. With the metadata from Amazon Rekognition Video, we now have the ability to make quick, automated decisions on content as soon as it arrives. Knowing where segments start or stop with data-informed timecodes enables earlier media supply chain decisions - like what length to make a preliminary screener that starts from the first frame after color bars or slate, eliminating slugs and ending before credits. This has the potential to help us improve the quality of our output, save hundreds of work-hours each year, and respond quickly in a highly-dynamic content marketplace.”

Promomii is an AI powered video logging and promo generation software company that helps creatives maximize the potential of their videos.

“Editors and producers in the broadcasting and creative video industry spend huge amounts of time going through large volumes of video footage to produce content. This process is monotonous, time-consuming and expensive. Promomii aims to streamline such labor-intensive work by providing accurate and thorough video analysis for our clients, so that they can allocate more resources towards creative work. By combining Amazon Rekognition Video features such as shot detection with PromoMii’s own algorithms, we can quickly and easily provide editors with the most interesting or valuable visual shots during their creative process and help them sell the content better in lesser time.”

Nomad is a cloud-native intelligent content management platform built on AWS serverless architecture, which seamlessly merges content and asset management with the power of AI/ML into one unified system.

“The Nomad Platform leverages video shot and segment level analysis for detecting, generating, and searching rich metadata for objects, persons, labels, dialogue and screen text. Analyzing the video and detecting the discrete shots accurately has been very challenging, and up to this point, we’ve used an in-house custom shot analyzer to separate the video into the searchable segments. With the new Amazon Rekognition Video features for media analysis, our shot detection accuracy has doubled, and we get the added benefit of detecting other segment types like black frames and end credits automatically. Higher shot detection accuracy and newly detectable segment types in the Nomad Platform allows us to greatly improve the user search experience and substantially reduce customer costs by avoiding additional metadata processing that was required previously.”

Synchronized transforms passive, linear video into ‘Smart-Video’. Our artificial intelligence engine understands the content and context of a video and enriches it with metadata. This metadata then frees the video from linearity making it fully interactive and as powerful as hypertext to meet the demands and expectations of the digital world.

“Today, television channels, driven by the demands of digital consumers, need to adapt traditional, long-form content produced for linear TV into short-form segments to satisfy online consumption. Segmenting and clipping content editorially is important for broadcasters so viewers can directly access the parts that are interesting to them. The Synchronized platform automates the full workflow required to segment, clip and distribute video content for broadcasters. However, accurate, automatic transformation of audiovisual content into editorial segments is an extremely complex task requiring layers of different techniques. But now, by combining Amazon Rekognition Video with our platform’s Smart-Segmentation service, we can significantly accelerate, streamline and automate the creation and delivery of clips accurately to TV editorial teams. They can then manipulate the segments without requiring specialists, and distribute them immediately. This process is not scalable if done manually. In addition, the ability to automatically detect end credits with Amazon Rekognition Video allows us to offer our customers a fully automated, turnkey solution to add features such as “Next Episode” buttons to their content catalogs.”

Learn more about Amazon Rekognition pricing

Visit the pricing page
Ready to build?
Get started with Amazon Rekognition
Have more questions?
Contact us