What does this AWS Solution do?
This solution creates subtitles for your video-on-demand (VOD) content. Providing localized video with high quality transcriptions, subtitles, and translations can greatly extend the reach of video content to new audiences; it also enhances the understanding of the content by all viewers. However, producing accurate, multi-language subtitles for video is both complex and labor-intensive. Many personnel hours are often spent transcribing, subtitling, translating, and reviewing media assets. Using AWS artificial intelligence (AI) services to assist in the subtitle creation process helps solve these problems.
Benefits
Upload and analyze videos, and work with automatically generated video subtitles using a simple web-based user interface.
Automatically extract valuable metadata from video files using Amazon Rekognition, Amazon Transcribe, Amazon Translate, and Amazon Comprehend.
You can review subtitles and make corrections within the application. Once you are satisfied with the subtitles, rerun the workflow using the corrected input to regenerate downstream results.
Use the application to generate Amazon Transcribe custom vocabularies and Amazon Translate custom terminologies using the corrections you make to the subtitles. Provide these customizations when you upload a video and configure the automated workflow.
AWS Media Insights Engine is a framework that makes it easier for developers to build serverless applications that process video, images, audio, and text with AI and multimedia services on AWS.
AWS Solution overview
This architecture depends on the AWS Media Insights Engine (MIE) development framework, which must be deployed in the AWS account in order to deploy the solution. MIE can be deployed separately or together with this solution as an option.
The diagram below presents the serverless architecture you can automatically deploy using the solution's implementation guide and accompanying AWS CloudFormation template.

AWS Content Localization solution architecture
The AWS CloudFormation template deploys the following infrastructure:
- An instance of the AWS Media Insights Engine solution, unless you choose to use an existing MIE instance.
- An Amazon CloudFront distribution to serve the solution’s web application.
- An Amazon Simple Storage Service (Amazon S3) web source bucket for hosting the static web application.
- An Amazon Cognito user pool to provide a user directory.
- An Amazon Cognito identity pool to provide federation with AWS Identity and Access Management (IAM) for authentication and authorization to the web application.
- Amazon API Gateway endpoints for the MIE workflow API, the MIE data plane API and the Amazon OpenSearch Service API endpoint.
- An AWS Step Functions workflow created by MIE. The content localization workflow consists of AWS Lambda functions that run jobs in Amazon Transcribe, Amazon Translate, AWS Elemental MediaConvert, and Amazon Polly. The workflow can also optionally run Amazon Rekognition and Amazon Comprehend to provide additional analysis of the input.
- A Lambda function to extract, transform, and load media metadata from the MIE data pipeline into an Amazon OpenSearch Service cluster. This Lambda function is invoked by the MIE data plane DynamoDB stream whenever asset metadata is modified in the MIE data plane.
- An Amazon OpenSearch Service cluster to index media metadata.
Content Localization on AWS
Version 2.0.0
Last updated: 02/2022
Author: AWS
Estimated deployment time: 20 min
Related content

Browse our library of AWS Solutions Implementations to get answers to common architectural problems.

Find AWS certified consulting and technology partners to help you get started.

Browse our portfolio of Consulting Offers to get AWS-vetted help with solution deployment.