AWS Machine Learning Blog
Translate video captions and subtitles using Amazon Translate
September 2021: This post and the solution has been updated to use the Amazon EventBridge events notifications in Amazon Translate for tracking Amazon Translate Batch Translation job completion.
Video is a highly effective a highly effective way to educate, entertain, and engage users. Your company might carry a large collection of videos that include captions or subtitles. To make these videos accessible to a larger audience, you can provide translated captions and subtitles in multiple languages. In this post, we show you how to create an automated and serverless pipeline to translate captions and subtitles using Amazon Translate, without losing their context during translation.
Captions and subtitles help make videos accessible for those hard of hearing, provide flexibility to users in noisy or quiet environments, and assist non-native speakers. Captions or subtitles are normally represented in SRT (.srt) or WebVTT (.vtt) format. SRT stands for SubRip Subtitle, and is the most common file format for subtitles and captions. WebVTT stands for Web Video Text Track, and is becoming a popular format for the same purpose.
Multi-language video subtitling and captioning solution
This solution uses Amazon Translate, a neural machine translation service that delivers fast, high-quality, and affordable language translation. Amazon Translate supports the ability to ignore tags and only translate text content in HTML documents. The following diagram illustrates the workflow of our solution.
The workflow includes the following steps:
- Extract caption text from a WebVTT or SRT file and create a delimited text file using an HTML tag.
- Translate this delimited file using the asynchronous batch processing capability in Amazon Translate.
- Recreate the WebVTT or SRT files using the translated delimited file.
We provide a more detailed architecture in the next section.
Solution architecture
This solution is based on an event-driven and serverless pipeline architecture, and uses managed services so that it’s scalable and cost-effective. The following diagram illustrates the serverless pipeline architecture.
The pipeline contains the following steps:
- Users upload one or more caption files in the WebVTT (.vtt) or the SRT (.srt) format to an Amazon Simple Storage Service (Amazon S3) bucket.
- The upload triggers an AWS Lambda function.
- The function extracts text captions from each file, creates a corresponding HTML tag delimited text file, and stores them in Amazon S3.
- The function invokes Amazon Translate in batch mode to translate the delimited text files into the target language.
- The Amazon Translate Job emits an Amazon EventBridge event when the job is complete and the configured Amazon EventBridge rule triggers an AWS Lambda function
- AWS Lambda function reads the translated delimited files from Amazon S3, creates the caption files in the WebVTT (.vtt) or SRT(.srt) format with the translated text captions, and stores them back in Amazon S3.
We explain Steps 3–7 in more detail in the following sections.
Convert caption files to delimited files
In this architecture, uploading the file with triggerFileName
triggers the Lambda function <Stack name>-S3CaptionsFileEventProcessor-<Random string>
. The function iterates through the WebVTT and SRT files in the input
folder and for each file, it extracts the caption text, converts it into a delimited text file using an HTML (<span>
) tag, and places it in the captions-in
folder of the Amazon S3 bucket. See the following function code:
The solution uses a Python library webvtt-py
to load, parse, and generate the WebVTT and SRT file formats. All the operations related to the library are abstracted within the Captions
module. Also, all Amazon S3 operations are abstracted within the S3Helper
module.
Batch translation of delimited files
After the delimited files are stored in the captions-in
folder of the Amazon S3 bucket, the Lambda function <Stack name>-S3CaptionsFileEventProcessor-<Random string>
invokes the Amazon Translate job startTextTranslationJob
with the following parameters:
- The
captions-in
folder in the S3 bucket is the input location for files to be translated - The
captions-out
folder in the S3 bucket is the output location for translated files - Source language code
- Destination language code
- An AWS Identity and Access Management (IAM) role ARN with necessary policy permissions to read and write to the S3 bucket
See the following job code:
Create WebVTT and SRT files from the delimited files
The EventBridge rule for Amazon Translate job status change notification triggers the Lambda function <Stack name>-TranslateCaptionsJobEventProcessor-<Random string>
. The function iterates through the each of the translated delimited files generated in the captions-out
folder based on the details available in the Amazon Translate Batch Job. See the following code:
The solution generates the WebVTT or SRT file using the original WebVTT or SRT file from the input folder for the time markers, but replaces the captions with the translated caption text from the delimited files. See the following code:
The function then writes the new WebVTT or SRT files as S3 objects in the output folder with the following naming convention: TargetLanguageCode
-<inputFileName
>.vtt
or TargetLanguageCode
-<inputFileName
>.srt
. See the following code:
Solution deployment
You can either deploy the solution using an AWS CloudFormation template or by cloning the GitHub repository.
Deployment using the CloudFormation template
The CloudFormation template provisions the necessary resources needed for the solution, including the IAM roles, IAM policies, and EventBridge rules. The template creates the stack the us-east-1
Region.
- Launch the CloudFormation template by choosing Launch Stack:
- For Stack name, enter a unique stack name for this account; for example,
translate-captions-stack
. - For SourceLanguageCode, enter the language code for the current language of the caption text; for example,
en
for English. - For TargetLanguageCode, enter the language code that you want your translated text in; for example,
es
for Spanish.
For more information about supported languages, see Supported Languages and Language Codes.
- For TriggerFileName, enter the name of the file that triggers the translation serverless pipeline (the default is
triggerfile
). - In the Capabilities and transforms section, and select the check boxes to acknowledge that CloudFormation will create IAM resources and transform the AWS Serverless Application Model (AWS SAM) template.
AWS SAM templates simplify the definition of resources needed for serverless applications. When deploying AWS SAM templates in AWS CloudFormation, AWS CloudFormation performs a transform to convert the AWS SAM template into a CloudFormation template. For more information, see Transform.
- Choose Create stack.
The stack creation may take up to 10 minutes, after which the status changes to CREATE_COMPLETE
. You can see the name of the newly created S3 bucket along with other AWS resources created on the Outputs tab.
Deployment using the GitHub repository
To deploy the solution using GitHub, visit the GitHub repo and follow the instructions in the README.md file. The solution uses AWS SAM to make it easy to deploy in your AWS account.
Test the solution
To test the solution, upload one or more WebVTT (.vtt) or SRT (.srt) files to the input
folder. Because this is a batch operation, we recommend uploading multiple files at the same time. The following code shows a sample SRT file:
After you upload all the WebVTT or SRT documents, upload the file that triggers the translation workflow. This file can be a zero-byte file, but the filename should match the TriggerFileName
parameter in the CloudFormation stack. The default name for the file is triggerfile
.
After a short time (15–20 minutes), check the output folder to see the WebVTT or SRT files with the following naming convention: TargetLanguageCode
-<inputFileName
>.vtt
or TargetLanguageCode
-<inputFileName
>.srt
.
The following snippet shows the SRT file translated into Spanish:
You can monitor the progress of the solution pipeline by checking the Amazon CloudWatch logs generated for each Lambda function that is part of the solution. For more information, see Accessing Amazon CloudWatch logs for AWS Lambda.
To do a translation for a different source-target language combination, you can update the SOURCE_LANG_CODE
and TARGET_LANG_CODE
environment variable for the <Stack name>-S3CaptionsFileEventProcessor
-<Random string> function and trigger the solution pipeline by uploading WebVTT or SRT documents and the TriggerFileName
into the input
folder.
Conclusion
In this post, we demonstrated how to translate video captions and subtitles in WebVTT and SRT file formats using Amazon Translate asynchronous batch processing. This process can be used in several industry verticals, including education, media and entertainment, travel and hospitality, healthcare, finance, law, or any organization with a large collection of subtitled or captioned video assets that wants these translated to their customers in multiple languages.
You can easily integrate the approach into your own pipelines as well as handle large volumes of caption and subtitle text with this scalable architecture. This methodology works for translating captions and subtitles between over 70 languages supported by Amazon Translate (as of this writing). Because this solution uses asynchronous batch processing, you can customize your machine translation output using parallel data. For more information on using parallel data, see Customizing Your Translations with Parallel Data (Active Custom Translation). For a low-latency, low-throughput solution translating smaller caption files, you can perform the translation through the real-time Amazon Translate API. For more information, see Translating documents with Amazon Translate, AWS Lambda, and the new Batch Translate API. If your organization has a large collection of videos that need to be captioned or subtitled, you can use this AWS Subtitling solution.
About the Authors
Siva Rajamani is a Boston-based Enterprise Solutions Architect for AWS. He enjoys working closely with customers and supporting their digital transformation and AWS adoption journey. His core areas of focus are serverless, application integration, and security. Outside of work, he enjoys outdoors activities and watching documentaries.
Raju Penmatcha is a Senior AI/ML Specialist Solutions Architect at AWS. He works with education, government, and non-profit customers on machine learning and artificial intelligence related projects, helping them build solutions using AWS. When not helping customers, he likes traveling to new places with his family.