AWS for M&E Blog

Multi-language automatic captions and audio dubbing made possible for live events with AWS Media Services and SyncWords

This post was co-authored by Giovanni Galvez, VP of Business Development and Strategy, SyncWords.

Introduction

Adding captions to live streaming events is not something new to the industry. There are well-established workflows to add captions to video feeds by using specialized hardware encoders to embed the captions. These workflows can create challenges for live event workflows, which we will explain in further detail throughout this post.

The common methods to embed closed captions into live streams include embedding captions into EIA-608 and 708 standards. Or using RTMP onCaptionInfo to embed base-64 encoded 608/708 caption data.

The challenge is that you have to embed the captions before you can process the live stream for distribution. This normally calls for an on-premises single channel caption encoder device as part of the live event streaming workflow. It also requires extra coordination and planning, as the event may require auto-generated captions or a live stenographer.

The other challenge is that 608/708 embedded captions do not support more than two languages, especially for languages outside of the seven languages supported by 608. You might argue that 708 protocol natively supports multiple languages, which is true from protocol perspective, but in practice, 708 does not have broader support in the end-to-end system, so it is rare to see a true implementation that uses 708 to its full extent. This means that if there is a requirement to stream with three or more languages with support for Unicode character sets like Chinese, Japanese, and Korean than the traditional 608/708 method of encoding is a non-starter.

As Over The Top (OTT) streaming gains popularity with the introduction of HTTP Live Streaming(HLS) in 2009 by Apple, we can deliver multiple language closed captions through sidecar captions, such as the Web Video Text Tracks (WebVTT) format.

Live captioning with SyncWords

This blog post presents a solution that removes the challenges of using embedded captions. With a simple API integration between your video processing and packaging workflow and cloud-based video software company SyncWords AI tools, you can easily enable closed captions, subtitle translation, and AI dubbing as an option without disrupting your existing live video workflow. SyncWords creates a secondary master manifest that includes the caption information while leaving your original master manifest intact, providing a reliable backup option.

SyncWords provides a platform for streaming live captioning and real-time translation using AI. Their goal is to drive product innovation and simple integration to help customers provide accurate, reliable, real-time multiple language captioning and dubbing services for live events.

The following is a typical live event streaming workflow running on AWS using AWS Elemental Media Services. Without changes to your existing workflow, SyncWords can ingest your live events playlist. Using SyncWords Live AI Captioning service, a modified live playlist with captions can be written to your origin, such as AWS Elemental MediaPackage or an Amazon S3 bucket. You can provide viewers the new playlist with captions and/or audio dubbing, which is enabled via SyncWords, with synchronized, multiple language captions.

SyncWords Diagram

Enhancing live stream captioning and translation synchronization

In live event captioning, there is typically a delay of 3 to 4 seconds between the spoken, on-screen dialogue and the appearance of captions on the screen. This delay occurs because of the need for specialized hardware to generate the caption text only after each sentence has been spoken. SyncWords developed a solution to address this challenge and synchronize the captioned and translated voice with the live stream.

By leveraging the latency of an HTTP Live Stream (HLS), SyncWords calculates the precise timing for the captioning and translations to appear in sync for viewers. This synchronization ensures an optimal user experience for unlimited viewers who are watching the live OTT programming. With this innovative approach, captions and translations are seamlessly integrated into the live stream, eliminating the noticeable delay and enhancing the overall viewing experience.

Through real-time synchronization, SyncWords gives viewers instant access to accurate and synchronized captions and translations. This not only benefits individuals who rely on captions for accessibility but also enhances the overall engagement and comprehension for all viewers. The HLS solution between AWS Elemental Media Services and SyncWords eliminates the frustration and inconvenience caused by delayed captioning, ensuring a seamless and inclusive viewing experience for live events.

The benefits of the workflow are:

  • No change to your current live event pipeline
  • No disruption to your live-event pipeline, multiple-language caption is enabled via separate path, as an add-on option
  • You can construct a redundant playlist where the primary playlist includes the generated captions, while your original playlist serves as the secondary playlist, which enhances the reliability of your live event
  • Add multiple-language closed captions to your live event without the heavy lift of deploying complex solutions
  • Use captions as a service to avoid maintaining complex caption related workflows
  • Add synchronized, multiple language captions for your live event workflow with a simple integration
  • Generate live, translated voice dubbing to your workflow in addition to live captioning
  • Scale captioning to unlimited channels of live programming without the need for specialized live captioning hardware, which is cost effective

Call to action

To gain further insights into creating a live event pipeline using AWS, please refer to our Live Streaming on AWS solutions page. Additionally, if you’re interested in getting started with AWS Elemental Media Services, please visit the product page. For live event captioning solutions and integration documentation, please visit SyncWords.

Chris Zhang

Chris Zhang

Chris Zhang is a Solutions Architect for AWS Elemental