Scale global live reach with AWS powered real-time WebVTT multilingual subtitling

Live streaming now reaches audiences across continents in real time, but language barriers still limit how many people can engage with this content. Traditional subtitle generation methods, which rely on manual translation services or prerecorded content workflows, can’t keep pace with the demands of live broadcasting.

In this post, we share a solution that delivers accurate subtitles in multiple languages for live events, synchronized with the spoken word, without adding video streaming latency. Although many real-time subtitling solutions exist, nearly all face the same fundamental compromise: they must choose between accuracy and latency, with most opting to add significant video streaming delay to achieve acceptable translation quality. This post explores how Amazon Web Services (AWS) services break this trade-off entirely with automatic real-time multilingual subtitle delivery that maintains broadcast-quality synchronization and contextual accuracy while keeping your video stream at its original latency—transforming live content accessibility without sacrificing viewer experience or cost efficiency.

Challenges in real-time multilingual subtitling

The media industry faces several critical barriers when expanding live content to international markets:

Cost prohibitive translation services – Professional translation costs escalate rapidly as content libraries and target languages grow, making global expansion financially challenging for many organizations.
Scalability limitations – Traditional subtitle workflows can’t handle sudden demand surges during major live events, sports finals, or breaking news coverage.
Trade-offs between quality and speed – Manual translation offers quality but introduces delays that make live subtitling impossible, whereas automated solutions often lack contextual accuracy.
Technical complexity – Integrating multiple translation services, synchronization systems, and delivery networks requires significant technical expertise and infrastructure investment.
Limited language support – Many existing solutions support only major languages, leaving substantial global audiences underserved.

These challenges prevent organizations from maximizing their content’s global potential and accessing new revenue streams from international markets.

AWS services for real-time multilingual subtitling

AWS provides a comprehensive suite of services that work together seamlessly to deliver real-time multilingual subtitles without compromising quality or scalability. The solution uses:

AI-powered speech recognition – Amazon Transcribe provides real-time speech-to-text conversion with support for custom vocabularies and domain-specific language models—essential for handling industry terminology, proper nouns, and specialized content.
Advanced language translation – Amazon Bedrock with Amazon Nova foundation models (FMs) delivers context-aware translations that maintain the original content’s meaning and tone across multiple target languages, going beyond literal word-for-word translation.
Scalable media processing – AWS Elemental Media Services handle live stream ingestion, transcoding, and delivery with built-in redundancy and automatic scaling to meet demand fluctuations.
Real-time data management – Amazon DynamoDB provides millisecond-latency storage and retrieval for time-synchronized subtitle segments, providing accurate synchronization even during high-traffic events.
Global content delivery – Amazon CloudFront intelligently routes requests and delivers subtitles with minimal latency to viewers worldwide. The intelligent caching mechanisms help avoid extensive costs on subtitle file construction.

What does the solution architecture look like?

The end-to-end workflow creates two parallel processing paths, one for video delivery and another for subtitle delivery, which converge at the content delivery layer.

Live stream ingestion and processing
The workflow begins when your live stream reaches AWS Elemental MediaConnect through supported streaming protocol. MediaConnect acts as a reliable transport service, providing built-in redundancy and monitoring capabilities essential for live broadcasting. From MediaConnect, the stream splits into two distinct paths, video processing and subtitle generation.

Path 1: Video processing follows this flow:

The stream flows to AWS Elemental MediaLive for transcoding into adaptive bitrate (ABR) format. This provides optimal video quality based on viewer device capabilities and network conditions.
The processed stream is sent to AWS Elemental MediaPackage for packaging and delivery preparation.

Path 2: Subtitle generation follows this flow:

A parallel stream copy is sent to Amazon Transcribe for real-time speech-to-text conversion.
The transcribed text passes to Amazon Bedrock with Amazon Nova for translation.
Custom vocabulary support handles industry-specific terminology and proper nouns accurately.

Intelligent subtitle processing
The solution detects sentence boundaries (such as periods, question marks, and exclamation points) in the transcribed text to identify complete thoughts before sending them for translation. Rather than translating individual words or phrases, the subtitle processing workflow waits for complete sentence delimiters from Amazon Transcribe before triggering translation.

This approach provides several advantages:

Contextual accuracy – Amazon Nova foundation models can analyze complete thoughts, resulting in more natural translations
No additional latency – The solution keeps subtitles in sync with spoken words without adding additional video streaming latency—a critical differentiator from most existing solutions that operate by adding video streaming latency for accuracy
Language structure optimization – Particularly beneficial for language pairs with different word orders (for example, subject-verb-object compared to subject-object-verb structures)
Improved readability – Subtitles appear as complete, coherent thoughts rather than fragmented phrases

Dynamic subtitle assembly and delivery
Translated subtitle segments are stored in Amazon DynamoDB with precise timestamps for real-time retrieval and synchronization. The delivery mechanism uses a sophisticated routing system.

For media requests, video content requests route directly from Amazon CloudFront to MediaPackage for optimal streaming performance.

For subtitle requests, WebVTT subtitle file requests follow a more complex path:

CloudFront routes requests to Amazon API Gateway
AWS Lambda functions retrieve timestamped subtitle templates from MediaPackage
Lambda queries DynamoDB for appropriate translated content based on requested language and timestamp
Dynamic subtitle insertion occurs in real time
Complete WebVTT files return to viewers synchronized with the video

This architecture keeps subtitles synchronized with video content while providing flexibility to serve multiple languages without duplicating video streams.

Key services and concepts
We use Lambda@Edge as a cost-efficient reverse proxy to intercept VTT chunks. The service intercepts the VTT chunks and modifies them by inserting the corresponding chunk creation time (timecode). This enables subsequent processes to fetch the appropriate subtitles from Amazon DynamoDB, which are generated by the automatic transcription and translation process.

When the solution is deployed, it creates two unique Amazon CloudFront distributions. The first CloudFront distribution acts as a reverse proxy that uses one of the HLS Ingest endpoints from MediaPackage as its origin and triggers Lambda@Edge to intercept and modify the VTT chunks in real time generated by MediaLive.

The following figure shows the Lambda@Edge association configuration in CloudFront, which enables real-time VTT chunk interception.

Figure 1: Amazon CloudFront with Lambda@Edge association

The second CloudFront distribution is for traditional media delivery to viewers and has two origins configured using the Behaviors tab:

*.vtt routes to subtitle origin (API Gateway), which invokes AWS Lambda to retrieve the appropriate subtitle in the respective language in real time
Other requests (default) route to video or audio origin (MediaPackage endpoints) for providing real-time streaming to end users

The following figure shows the behaviors configured for specific path patterns

Figure 2: Behavior configuration in Amazon CloudFront

For the MediaLive HLS group destination URL, the format should be ‘{CloudFrontUrl}/in/v2/{HLS ingest endpoint id}/{HLS ingest endpoint id}/channel’. Credentials required to access the group destination are automatically generated when the CloudFormation stack is deployed

Note: We’ve achieved the best results with a 4-second HLS segment length. Other segment lengths will also work, but they need to be tested.

The following screenshot shows the MediaLive HLS group destination configuration with the CloudFront URL format for the MediaPackage ingestion endpoint. The credentials section and HLS settings are automatically generated.

Figure 3: MediaLive HLS group destination configuration

The complete solution architecture is illustrated in the following figure.

Figure 4: High-level architecture of the solution

User experience

From the viewer perspective, the solution provides seamless, broadcast-quality subtitle delivery with instant language switching capabilities. Users can select from multiple subtitle languages through standard video player controls. Language selection is clearly displayed, and switching between languages occurs instantly without interrupting video playback.

The subtitle display maintains professional broadcast standards:

Proper timing – Subtitles appear and disappear in sync with spoken content
Natural phrasing – Complete sentences provide readability and comprehension
Consistent formatting – WebVTT standard means compatibility across devices and players

This solution uses the natural latency inherent in streaming formats such as HLS and Dynamic Adaptive Streaming over HTTP (DASH), which is typically about 15 seconds, to process and optimize translations in parallel, so that additional end-to-end latency isn’t introduced to the viewer experience.

For scenarios with rapid speech patterns or minimal pauses—such as fast-paced sports commentary—the solution can use the AWS Elemental MediaConnect buffer delay to accumulate longer transcription segments, providing large language models (LLMs) with additional context for more accurate translations. From an accuracy perspective, the solution employs the Amazon Transcribe next-generation speech FM, which delivers high-accuracy transcriptions optimized through custom vocabulary support and domain-specific tuning. When combined with the context-aware translation capabilities of Amazon Bedrock, the solution maintains both linguistic precision and the original tone of the content across supported languages. The solution automatically scales to handle varying loads, from intimate webinars to global sporting events with millions of concurrent viewers.

Key benefits of the solution

The solution offers several benefits:

Cost efficiency – Eliminates expensive manual translation services while providing superior scalability. Pay-as-you-use pricing means costs scale with actual usage rather than peak capacity planning.
Global market access – Instantly expand content reach to new international markets without traditional localization barriers. Support for multiple languages enables simultaneous global distribution.
Broadcast quality results – AI-powered translation maintains context and tone while delivering subtitles with broadcast-standard timing and synchronization.
Automatic scalability – Infrastructure automatically scales to meet demand without manual intervention, handling everything from small corporate events to major international broadcasts.
Future-proof architecture – Built on AWS services, the solution automatically benefits from ongoing AI improvements and new language support without requiring architectural changes.
Rapid deployment – Use existing AWS infrastructure and services to implement the solution quickly, without building custom translation or delivery systems.

Getting started with real-time multilingual subtitling

The solution is now available as an open source implementation on GitHub, providing complete deployment guides and customization options. The repository includes:

Complete infrastructure as code (IaC) templates
Step-by-step deployment instructions
Customization guides for different use cases
Performance optimization recommendations

To deploy the real-time multilingual subtitling solution, follow these steps:

1. Review the architecture documentation and assess your specific requirements

2. Clone the solution repository:

git clone https://github.com/aws-samples/sample-auto-multilingual-subtitle-for-live-event.git

3. Deploy the solution in a development environment for testing

4. Customize language selections and subtitle formatting for your brand

5. Integrate with your existing live streaming workflow

6. Monitor performance and optimize for your audience patterns

Conclusion

Real-time multilingual subtitling helps media organizations expand globally. By combining AWS Elemental Media Services with AI-powered transcription and translation through Amazon Transcribe and Amazon Bedrock, organizations can eliminate traditional barriers to international market entry.

The solution delivers broadcast-quality results while automatically scaling to meet demand, reducing costs significantly compared to traditional translation services. Most importantly, it opens new revenue opportunities by making live content accessible to global audiences in real time.

Whether you’re broadcasting live sports, corporate events, news, or entertainment content, you can reach every audience, in every market, as events unfold by using AWS powered real-time multilingual subtitling.

For technical implementation guidance and to explore how this solution can transform your content delivery strategy, connect with your AWS solutions architect or visit the AWS for Media & Entertainment solutions page.

AWS for M&E Blog

Scale global live reach with AWS powered real-time WebVTT multilingual subtitling

Challenges in real-time multilingual subtitling

AWS services for real-time multilingual subtitling

What does the solution architecture look like?

User experience

Key benefits of the solution

Getting started with real-time multilingual subtitling

Conclusion

Further reading

Resources

Follow

Learn

Resources

Developers

Help