AWS for M&E Blog

The next generation of media intelligence with Guidance for Media2Cloud on AWS version 4.0

We are pleased to announce the official release of Guidance for Media2Cloud on AWS version 4.0. This release offers enhancements that assist Amazon Web Services (AWS) customers in monetizing their media archives, generating content metadata more cost effectively, and leveraging generative AI to produce new kinds of content metadata. The new version of Media2Cloud still includes the features that customers have used since its first release in 2018.

What is Guidance for Media2Cloud on AWS?

Guidance for Media2Cloud on AWS is a serverless, end-to-end ingest and analysis solution to move video assets and associated metadata to AWS. During ingest to the cloud, Media2Cloud analyzes videos, images, audio, and documents to extract valuable metadata, including image and video media information, audio transcriptions, and tabular information from scanned documents, powered by AWS artificial intelligence (AI) services. This metadata can be used to make media searchable and enable it to be reused for new purposes. In this latest release, metadata can also be enriched using generative AI to produce image captions, transcript summaries, and content classifications powered by Amazon Bedrock. This enriched metadata allows customers to more easily search, discover, and reuse their media assets.

For AWS customers, the Media2Cloud guidance provides an automated media asset processing pipeline with multiple workflows to effectively manage your video, photo, podcasts, and documents. The Media2Cloud workflow supports standardized metadata, identifiers, proxies, and adding machine learning metadata to content as a foundation. This makes it easier to manage and search for all assets as libraries or archives grow.

For AWS Partners, the guidance provides flexibility to customize based on a customer’s requirements and speed up the ingestion of customers’ content. The output of the solution and AI-generated metadata can be augmented with other datasets and integrated with Media Asset Management platforms. The customizations include tailored image classification and object detection based on brand, identifying the most important key phases, and face collections.

To learn more about previous releases of Media2Cloud, read the blog posts for Media2Cloud V1, V2, V3.

What’s new in version 4.0?

We continue to listen to our customers and partners to enhance Media2Cloud features. Jukin Media is a global entertainment company built on the belief that the future of storytelling is user generated. The company produces original content for TV, the web, and emerging platforms, with more than 200 million fans generating over 2 billion video views each month. With TrackIt already in place as a trusted provider, Jukin Media engaged the company to automate its inefficient fulfillment processes. As an AWS Advanced Consulting Partner, TrackIt recommended working with the AWS Media2Cloud offering. This solution enabled Jukin Media to build a serverless ingest workflow to move video assets and associated metadata to AWS.

Version 4.0 of Media2Cloud now includes new generative AI powered plugins and Shoppable experience, and optimization to existing features such as analysis performance and cost.

Dynamic frame analysis

Dynamic frame analysis further optimizes the cost of computer vision analysis of videos. In version 3, Media2Cloud introduced the option to perform frame-based analysis using Amazon Rekognition APIs. Customers can control how often to sample frames based on the resolution and cost requirements of their applications. In version 4.0, frame-based analysis is further optimized using Laplacian variant and perceptual hashing algorithms to understand whether adjacent frames are similar. It runs Amazon Rekognition Image APIs on frames that are significantly different. As a result, it saves  65% API requests for fast motion content and saves up to 95% on steady content, such as news content.

Automatic face indexing

Automatic face indexing makes it easier to find and tag unrecognized faces that are important for your content library. Automatic face indexing identifies unrecognized faces during the analysis workflow. The Media2Cloud UI has a new tagging interface that enables users to view unrecognized faces and “tag” them with real names. After faces are tagged once, Automatic Face Indexing propagates the name to all new and previously analyzed content managed by Media2Cloud without needing to re-run the analysis workflow.

Scene change detection

Media2Cloud groups shots into scenes using a combination of AWS AI services such as Amazon Rekognition Segment API and Amazon Transcribe API, and open-source machine learning models to generate image embeddings of the frames and an ephemeral Faiss vector store to store and find similar frames. The scenes can then be used to help solve use cases that require finding breaks in the video, such as analyzing ad breaks.

Ad break detection and contextual ads

Leveraging Scene change detection, the Ad break detection identifies frame accurate ad break opportunities in video content. Ad breaks detected by Media2Cloud also provide contextual metadata in the form of IAB Content Taxonomy V3, GARM Taxonomy. The categories describe the content before and after the ad break. Descriptions can be used by ad decision servers to decide what ad content should be used based on the context of the video. The ad break detection feature also uses Amazon Transcribe, Amazon Rekognition Label API, and AWS Elemental MediaConvert Loudness (LUFS) output to identify non-intrusive, frame accurate ad break opportunities.

Generative AI plugins

A new, generative AI-powered analysis is added to derive new metadata from video transcripts using Amazon Bedrock. It provides a summary of the content, conversation analysis based on the transcription, and performs classification tasks such identifying genre, sentiment, and MPAA ratings. Using Amazon Bedrock, Media2Cloud provides scene-based descriptions, taxonomies, brands, and logos.

Shoppable experience

Shoppable experience provides an immersive in-content shopping experience, which can be described as an “X-Ray”-like experience. Using open-source object detection and image classification model, Media2Cloud enables the annotation and display of labels in videos, allowing users to conveniently shop for particular items and find similar products featured in the video. Through automated processes such as label detection, extraction, searching, and matching, the video content is filtered and labeled, expediting labor-intensive manual labeling tasks before generating shoppable metadata.

These new features will continue to help organizations extract value from their media assets on AWS. The addition of generative AI helps organizations makes decisions more quickly, and additional feature leveraging AI/ML services provide richer metadata and monetization opportunities.

How to get started with Guidance for Media2Cloud on AWS version 4.0

Guidance for Media2Cloud on AWS verion 4.0 is available on the AWS Solutions page. The source code is published to the GitHub guidance-for-media2cloud-on-aws repo. Check out the Implementation Guide to start creating the workflow in your AWS account. Visit AWS product pages to learn more about the underlying AI and machine learning services used with Media2Cloud, including Amazon Rekognition, Amazon Transcribe, Amazon Bedrock, and Amazon Comprehend.

Alex Burkleaux

Alex Burkleaux

Alex Burkleaux is a Sr. AI/ML Specialist Solutions Architect at AWS. She helps customers use AI Services to build media solutions. Her industry experience includes over-the-top video, database management systems, and reliability engineering.

Jake Izumi

Jake Izumi

Jake Izumi is a Senior Solutions Architect supporting the NAMER ISV customers at AWS. By utilizing his previous experience supporting corporate growth strategies, Jake works with business and technology leaders to innovate and grow on top of AWS.

Ken Shek

Ken Shek

Ken Shek is an AWS Principal Solutions Architect specializing in Data Science and Analytics for the Global Media, Entertainment, Games, and Sports industries. He assists media customers in designing, developing, and deploying workloads on the AWS Cloud using best practices. Passionate about artificial intelligence and machine learning use cases, he has built the Media2Cloud on AWS guidance to help hundreds of customers ingest and analyze content, enriching its value.