Skip to main content

Guidance for Media Extraction and Dynamic Content Policy Framework on AWS

Overview

This Guidance demonstrates how to accelerate your content analysis workflows by automating video metadata extraction, intelligence gathering, and content moderation. It enables you to efficiently process large volumes of video content, extract valuable insights, and make data-driven decisions through customizable policy evaluations. By automating these traditionally manual tasks, you can reduce operational costs, improve accuracy, and scale your content analysis capabilities while maintaining secure and reliable operations.

How it works

Extraction of generic metadata

This architecture diagram shows how to use generative AI to extract generic metadata from videos and demonstrates a dynamic policy evaluation analysis.

Diagram of an AWS architecture workflow for video extraction and evaluation services, featuring components like Amazon CloudFront, S3, Cognito, API Gateway, Step Functions, Lambda, DynamoDB, Bedrock, Rekognition, Transcribe, and OpenSearch Service.

Restful APIs of the extraction service

This architecture diagram illustrates the key RESTful APIs of the extraction service, served through Amazon API Gateway. The UI uses APIs to retrieve data, allowing users to integrate the extraction service into existing workflows.

Flowchart of an AWS-based video processing workflow using services like API Gateway, Lambda, S3, DynamoDB, Step Functions, and others for tasks such as transcription, frame sampling, image analysis, and data storage.

Deploy with confidence

Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs. 

Go to sample code

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Amazon CloudWatch provides logging and insights for the services running in AWS Lambda and Step Functions.  This Guidance pushes metrics to CloudWatch at various stages to provide observability into the infrastructure, such as Lambda functions, AI/ML services, and S3 buckets.

Read the Operational Excellence whitepaper 

This Guidance implements least-privilege AWS Identity and Access Management policies and encrypts S3 data using AWS Key Management Service (AWS KMS) keys. User authentication is handled through Amazon Cognito using OAuth patterns for both web application login and API Gateway calls. The OpenSearch service cluster is deployed in an Amazon Virtual Private Cloud (Amazon VPC) private subnet, accessible only to authorized Lambda functions.

Read the Security whitepaper 

Amazon S3 provides robust data management through version control, deletion prevention, and cross-region replication capabilities. Serverless services like API Gateway, Lambda, Step Functions, and Amazon Simple Queue Service (Amazon SQS) offer built-in scalability and high availability. The OpenSearch Service deployment supports high availability through multiple Availability Zones, featuring redundant data nodes with replicated shards, helping ensure data persistence and recovery capabilities.

Read the Reliability whitepaper 

Lambda and Step Functions enable efficient parallel processing through concurrent execution of functions and workflow steps. This parallel processing capability improves overall throughput and reduces execution time. The serverless architecture automatically handles the complexity of scaling workloads on AWS for optimal performance for media processing tasks.

Read the Performance Efficiency whitepaper 

Amazon S3 storage classes and lifecycle policies optimize video storage costs, while serverless and AI/ML services operate on a pay-as-you-go model, meaning you only pay for services used. The event-driven architecture helps ensure charges apply only for resources actually used, allowing you to configure and tailor your media workflows cost-effectively while using S3 lifecycle policies to to store and archive ingested contents, proxies, and metadata.

Read the Cost Optimization whitepaper 

AWS serverless services and AI/ML components allocate compute resources dynamically based on demand, eliminating over-provisioning and reducing resource waste. This approach minimizes energy consumption and compared to traditional on-premises servers, while maximizing the efficiency of AWS AI services to reduce the environmental impact of backend operations.

Read the Sustainability whitepaper 

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.