AWS for M&E Blog

Using computer vision to automate media content deduplication workflows

This blog was coauthored by Vibhav Gupta (Quantiphi), Noor Hassan (Amazon Web Services), and Liam Morrison (Amazon Web Services).

Introduction

The media and entertainment (M&E) industry is undergoing a multitude of transformations, driven by ever-changing industry trends across the value chain from content production, supply chain, and broadcast to distribution. These transformation initiatives are leading to increased migrations to the cloud and proliferation of media assets, thereby leading to increased storage and networking costs with challenges in the retrieval of relevant content.

In this blog, we present Quantiphi’s artificial intelligence (AI)–based deduplication solution that is tailored for media assets to help broadcasters reduce costs and lower backup times by removing an estimated average of 25 percent of duplicate content.

Quantiphi

Quantiphi is an Amazon Web Services (AWS) Advanced Consulting Partner and a member of the AWS Partner Network (APN) with AWS Competencies in Machine Learning (ML), Data & Analytics, Migration, and DevOps. Quantiphi is also a launch partner for AWS Media Intelligence solutions and has multiple AWS Service Delivery designations recognizing its expertise in supporting specific AWS services.

The challenge

Media organizations tend to create multiple versions of the same content during the editorial and postproduction process, inadvertently adding unwanted dollars to their storage expenses. These duplicates are created as a result of requirements for different aspect ratios or resolutions or different audio tracks for the same video asset; the presence of black frames, graphics, text, or other effects; and existing video asset subsets such as proxies and highlights (or clips) amongst other reasons, to facilitate compliance with regional requirements of distribution. As organizations grow and add more content and distribution streams, editorial operations also increase in conjunction to take care of compliance needs. Duplicate content versions result in media assets with the same metadata. Because metadata tagging of content over the years has become the norm rather than an exception for media organizations to facilitate search simplicity, duplicate content renders this process moot as it significantly increases the time to find the relevant content.

Therefore, a video deduplication solution becomes critical to identify the primary video asset and reduce storage costs.

Additionally, while reducing costs and providing more effective backup processes are the guiding principles of deduplication solutions, it is also important that the solution works seamlessly with existing media asset management (MAM) systems to facilitate minimal changes in operations management for the content operations teams.

Quantiphi’s deduplication solution

We noticed fairly early in our journey an increasing trend of content duplication (in the cloud) for a number of our customers. Our belief was that if we can help customers identify the volume of duplicate content, we can help them reduce operational costs, and especially that identifying duplicate assets is not humanly possible as customers have thousands of them. Finally, it is also critical that the solution can be simply integrated with existing customer systems, thereby not causing changes in their existing processes.

Keeping the above in mind, we built our AI-based video deduplication solution. To facilitate high accuracy in duplicate content detection, the solution uses video and audio similarity technology to compare individual frames of different assets and provides a duplication score with a confidence level between the assets. Over time, we have enhanced our ML models to identify minor differences in content such as the presence of a network logo in the corner of the frame. Additionally, through multiple customer engagements, we have optimized our solution so that it is cost effective and secure as our customers scale.

Finally, to help our customers analyze the results, the solution provides a report that maps similar files to a parent asset in the form of a cluster with details on percent duplication, nature of duplication, reduced storage costs, and more.

So how does the solution work?

The solution follows a three-step process, which we detail in the section below:

  1. Embedding generation – Creation of frame-level audio and video metadata to get enriched information of the media asset
  2. Auto deletion – Deletion of duplicate assets using the frame-level accuracy created as part of step one
  3. Asset restoration – Automatic restoration of deleted media assets if required for broadcasting purposes
Solution overview

Solution overview

Embedding generation – As a first step, the solution generates frame-level embeddings for the audio and video content separately. The embeddings are basically frame-level metadata to provide enriched information that helps in subsequent processes. The metadata captures information such as presence of watermarks, subtitles, and even change in video quality. The objective is to capture as much granular data as possible and cover different scenarios so that we can identify even the minutest of duplicate cases. For this purpose, we use different models for generating audio and video embeddings, and as an end result, the solution generates a JSON file which contains essential information such as:

  • Asset duplicate status
  • Timestamp mapping of duplicate segments with respect to the original content
  • Asset metadata like codec information, resolution, frame rate, bitrate, etc.

This information helps in the next steps of the process, which are auto deletion and asset restoration.

Auto deletion – After generating the JSON file, the solution automates deleting duplicate audio and video segments. To facilitate only duplicate content being deleted, the solution breaks down the video segments into frames of up to 1/10th level to check for content audio and video similarity. To further elaborate, the solution breaks down each frame (assuming content is produced at 25 frames per second, each frame corresponds to 0.04 seconds) to the 1/10th frame level, 0.004 seconds, to achieve high duplicate detection accuracy. Additionally, the solution can be customized to give customers the ability to decide what duplicate content can be deleted or stored in an archive. Furthermore, customers can customize business rules to flag and identify content with exact similarities and minor frame-level differences such as presence of filters, black frames, or watermarks. The solution can be further fine-tuned to handle exceptions for special cases like handling similar intro songs at the beginning of all video assets of a particular show.

Asset restoration – Additionally, using the solution, customers can automatically restore duplicate content from archived storage if required. During the embedding generation process, the solution creates frame-level audio and video metadata. This metadata is stored in a JSON file that is further enriched in step two with the data of duplicate assets getting deleted. The JSON file is then used to restore assets because it has the information necessary to generate the duplicate video asset again. This entire process completes within 30 minutes for high-quality (up to 4K videos in Apple ProRes codec with a 16:9 aspect ratio and in any container format such as MP4, MOV, MXF, and Audio Video Interleave), 1-hour long video content. Additionally, the content restores in the same format as the earlier deleted duplicate asset.

Through the above process, the solution helps customers identify and remove duplicate content and save on cloud storage space. Additionally, the metadata generated provides a clear lineage on parent and child assets and helps in content search. Furthermore, the solution automates the postprocessing process by comparing each and every pixel of the unique and duplicate video frames. In case of audio tracks, the solution compares audios at the 1/10th of a frame level. Prior to deployment, we have tested the solution on terabytes of actual media content and have achieved over 98 percent precision in identifying the duplicates.

Quantiphi built this robust and scalable solution by using a combination of AI/ML services provided by AWS.

Let’s take a look at the solution architecture:

Solution architecture

Solution architecture

  1. The solution can be integrated with a MAM application or used alongside a bucket from Amazon Simple Storage Service (Amazon S3), an object storage service. This happens through a user interface (UI) hosted on an instance that automatically scales on Amazon Elastic Compute Cloud (Amazon EC2), a web service that provides secure, resizable compute capacity in the cloud. The UI uses Application Load Balancer, which is ideal for advanced load balancing of Hypertext Transfer Protocol and Secure Hypertext Transfer Protocol traffic, or an Amazon S3 link so that users can host files.
  2. The uploaded files are queued in Amazon Simple Queue Service (Amazon SQS), a fully managed message queuing service, for processing. Amazon SQS makes a call to AWS Lambda, a serverless, event-driven compute service, which initiates the shots generation model hosted on Amazon SageMaker, which helps data scientists and developers to prepare, build, train, and deploy high-quality ML models. Once the results generate, metadata uploads to the Amazon S3 bucket.
  3. When the results are generated, AWS Lambda queues the information into Amazon SQS, and Amazon SQS in turn communicates with another Lambda function to check and confirm if all the files in the queue have been processed and the metadata is generated.
  4. AWS Lambda initiates the video similarity process after all the uploaded videos have completed stage two. The model processes the metadata generated and updates the primary duplicate’s mapping information to Amazon Relational Database Service (Amazon RDS). The model communicates with AWS Lambda once the processing is complete.
  5. All the identified primary duplicate pairs are queued for further processing and AWS Lambda invokes the frames validation module for the identification of the final similarity score between the primary and duplicate pairs.
  6. The frames validation module processes the files based on the predefined criteria, establishes the final shots mapping between the video pairs, and updates Amazon RDS with similarity scores and time-coded metadata. The module also uploads the mapping JSONs to Amazon S3.
  7. Finally, another Lambda function launches when the mapping JSONs are uploaded and interacts with the customer application to delete the duplicate content or showcase the final output in the UI.

Customer use case

As part of a transformation initiative to identify new avenues of content monetization for more than 10 PB of video data, the customer, a leading mass M&E conglomerate, is migrating its video assets to the cloud and wants to understand the following parameters:

  • Percentage of duplicate assets in its repository
  • Potential reduction in operational costs

To assist in achieving this goal, Quantiphi worked on a portion of the customer’s video assets to showcase the simplicity of identifying duplicate assets. The deduplication solution generated high quality embeddings and identified over 35 percent of duplicate content.

Going forward, the solution will help the customer identify a huge volume of duplicate assets and provide a path to save on operational costs. We are currently working with them to create a road map for migrating petabytes of their data to AWS.

Summary

The media industry generates large volumes of data during the entire production value chain, which results in the creation of a huge amount of duplicate data that increases storage costs. As content moves from production to the editing board, the content file needs to be modified for commercials and/or compliance requirements to meet different industry needs. This process, creates multiple duplicates, which eventually results in huge costs in terms of data storage, maintenance, and retrieval. In our experience, a media enterprise archive contains more than 25 percent duplicate content. By using cloud storage, this deduplication solution can help save around 20 percent in operational costs per month for every petabyte of content.

Therefore, an AI-based deduplication solution helps in reducing annual operational costs and facilitates a neatly indexed archive for simpler search and retrieval.

Reach out to us to learn more about Quantiphi’s solutions.

Noor Hassan

Noor Hassan

Noor Hassan - Sr. Partner SA - Toronto, Canada. Background in Media Broadcast - focus on media contribution and distribution, and passion for AI/ML in the media space. Outside of work I enjoy travel, photography, and spending time with loved ones.

Liam Morrison

Liam Morrison

Liam Morrison is a Principal Solutions Architect, based in New York City. Liam specializes in practical applications for machine learning in the media and entertainment industry.