AWS for M&E Blog

Configure live captions with SyncWords and AWS Elemental Media Services

As a content creator, producer, or broadcaster it can be daunting to provide quick, and accurate transcriptions, dubbing or translations, for live streaming. However, I’ll discuss how to configure different services that could be just the cutting edge you need, saving you both time and money.

Introduction

Consumers all over the world have access to more content than ever before, including content created in foreign languages. Some consumers rely on accessibility services like captions/subtitles (we’ll refer to collectively as just captions) which were originally designed for the hard of hearing, and audio descriptions for people with limited vision.

Beyond this, captions are also useful in many environments, regardless of hearing or eyesight (for example in public places where audio is not suitable). Legislation is also being put in place to mandate broadcasters to provide such accessibility services for more of their content.

There are many workflows and solutions for video on demand (VOD) content to provide captions and foreign language translations. Now there is also a new, straightforward way to do the same for live streaming without much heavy lifting. I’ll explain how to configure Amazon Web Services (AWS) Elemental services and the live captioning service (from SyncWords) to enable automatic, AI-generated live captions and dubbing, including translations into multiple languages.

A challenge content creators, providers and broadcasters face with live broadcasts is to generate accurate and timely captions as the time from camera capture to screen presentation is very fast. One way to do this today is to have transcribers listening to the audio, type what is being said and then present that into the media workflow. This can be costly, subject to human errors, and cause difficulties synchronizing the text with the audio.

During live broadcasting the presentation of the captions typically lags between four and eight seconds behind the video/audio, while words and parts of sentences may be misspelled or missed completely. These issues cause bad end user experiences. One example would be live news where the presenter quickly moves from one story to another. If the accessibility service (for example, captions) is several seconds delayed it may present the text about a story several seconds after the video has moved onto the next news story.

Timing and accuracy are key elements to provide a good experience for the users of accessibility services. This is especially true for mobile viewers who consume live streaming with captions and subtitles on the go and in noisy environments.

In addition to timing and accuracy challenges, broadcasters often encounter significant hurdles when inserting live captions into their broadcasts. It often requires specialized hardware upstream from the video contribution stage. It may also rely on software-based encoders to insert user data into transport streams according to specific TV protocols such as EIA-608, Teletext, digital video broadcasting (DVB) subtitling, and others.

The solution

However, there’s a solution: AWS Elemental MediaLive and AWS Elemental MediaPackage with SyncWords offer a streamlined approach to achieving accurate, synchronized, and translated captions. By leveraging the structure of a live HTTP live streaming (HLS) manifest, broadcasters can seamlessly integrate AI capabilities into their live streams. This eliminates the need for cumbersome hardware and streamlines the entire process.

I’ll describe how to configure a cloud-centered workflow for live captions based on the use of AWS Elemental Link (encoding devices), AWS Elemental MediaLive, SyncWords’ AI Captioning Service and AWS Elemental MediaPackage.

Prerequisites

The workflow uses MediaLive and MediaPackage services and I have assumed you are familiar with the setup of these services using the AWS Console. If this is not the case please refer to Getting started with AWS Elemental MediaLive and Getting Started with AWS Elemental MediaPackage.

You’ll also want to confirm you have an account with SyncWords. Contact SyncWords directly or purchase through the AWS Marketplace.

The workflow overview depicts AWS Elemental Link, which is a small hardware device encoder ideal for transporting video with high resiliency and built-in security, with video encryption and key rotation. In this example the contribution source feed is connected to Link which encodes and pushes your source stream directly into MediaLive over an ethernet connection. It is not mandatory to use a Link unit, but it is a great unit to use, as it is plug-and-play to get your source visible as an input in MediaLive. Other inputs accepted by MediaLive can be found in the MediaLive User Guide section: Setup: Creating inputs.

The workflow overview shows how the contribution source content flows into AWS Elemental MediaLive and from there into the SyncWords service for transcription, and optionally translation and dubbing. The output from SyncWords then flows into AWS Elemental MediaPackage and into Amazon CloudFront CDN for distribution to end clients.

Figure 1: Workflow overview.

Configuration

This example workflow consists of MediaLive, SyncWords and MediaPackage services. Log into your AWS Management Console and access AWS Elemental MediaPackage to start.

Step 1: Start by creating a channel in MediaPackage (v2 is used here):

  • Create you Channel Group.
  • Then create your Channel. Note that when creating your MediaPackage v2 channel you will need to create a custom channel policy to allow the stream from SyncWords access. Instructions on how to define your policy for Publishing HLS streams from SyncWords Live to your AWS MediaPackage v2 Channel.
  • Within your channel, create an HLS Origin endpoint by clicking the “Create endpoint” button. This is the endpoint the stream from SyncWords will push to.
    • On the “Settings” page of your endpoint you will find the MediaPackage HLS ingest endpoint URL needed to configure your SyncWords HLS Output in Step 2. Make a note of it.
    • For your HLS Origin endpoint; Under “Segment settings”, validate the “Use audio rendition group” box is selected.

Note: Setup in MediaPackage v1 is different, as you do not set up a custom channel policy. Instead, use the URL, Username and Password from your MediaPackage v1 HLS Ingest endpoint when configuring your SyncWords HLS Output in Step 2.

Step 2: Configure your SyncWords channel:

  • Log into your account with SyncWords.
  • Navigate to Services or Events and click “Create Service” or “Schedule an Event”. I’m using the Service setup here, as shown in Figure 2.
  • Define the Service Name and click “Create Service”.
A screenshot of the SyncWords GUI where the service is named and created.

Figure 2: Screenshot of SyncWords’ “Create Service” setting.

  • Create an input endpoint for MediaLive to stream HLS to as an “HLS Push” type. This will automatically generate a unique URL endpoint, username and password to access the SyncWords service endpoint.
  • Note: You will need the URL, username and password settings when configuring MediaLive in Step 3. Once you are streaming from MediaLive into this endpoint you should see the Connection Status change from “Not Connected” to “Connected”.
A screenshot of the SyncWords GUI where the format of the Input Media into SyncWords is defined - in this case HLS Push. The URL (Endpoint) has also been filled in with the created channel. A username and passwords are also entered to permit the connection status change.

Figure 3: Screenshot of SyncWords’ “Input Media” settings.

  • In the “Transcript” section (Figure 4), first set the Transcript Type to “Automated (AI)”. Then select the Speech Engine and the Source Language to be used for the recognition of your source audio. If required, use the Add Dictionary for any custom terminology the speech engine may have problems with (for example, names and abbreviations).
  • Select any additional settings under “More Options”. Note: Options may vary depending on the Speech Engine selected.
A screenshot of the SyncWords GUI Transcript section where the source language and which speech engine to use for the transcription of the audio is configured. I have set the transcription type to Automated (AI), the source language to English, and the speech engine to Speechmatics. No dictionary was added. In the More Options section the boxes for Filter Profanity and Filter Filler Words have been checked.

Figure 4: Screenshot of SyncWords’ “Transcript” settings.

  • In the “Translations” section (Figure 5) add the languages you wish to translate into. This defines which output languages will be presented to the end user client as captions. Select your preferred engine to perform the translations from the MT Engine drop down list.
  • Select the “Audio” box for those languages you want dubbed audio tracks created for, and select your required dubbed voice settings (Dialect, Gender, Speaker and Speaking Rate).
  • Set the volume of the original audio to be mixed in with the dubbed tracks as a percentage between 0-50.
  • To increase accuracy of automatic translations, add custom glossaries and fill in the fields; source language, target language and custom glossary. Glossaries may be needed for terms that may be difficult to translate or do not have a translated version in the target language. A glossary preset can also be created. How to do this is explained in How to Create a Translation Glossary.
A screenshot of the SyncWords GUI where any translation and dubbing languages are configured. Under add translations it shows I am using the MT Engine Amazon with the Text and Audio check boxes selected and have selected the Dialect of Spanish, Gender female, Speaker Lucia, with a Speaking Rate of 1.0. I have also selected the MT Engine DeepL with the Text and Audio check boxes selected and have selected the Dialect of Chinese mandarin, Gender male, Speaker cmn-CN-Standard, with a Speaking Rate of 1.2. Arabic, Norwegian, Urdu and Vietnamese have each been selected using the MT Engine Amazon for text translation only. The original audio volume is set to 20%. No additional glossaries were added.

Figure 5: Screenshot of SyncWords’ “Translations” settings.

  • Create your HLS output in the “Outputs” section (Figure 6). Select “AWS MediaPackage” and “v2” from the drop-downs.
  • Copy the HLS ingest endpoint URL from your MediaPackage v2 Channel “Settings” page (see Step 1).
  • Set your buffer size. Note: For streams including translations, at least a full sentence is needed before a translation can be processed due to different languages placings words in different orders (such as, in German the verb is often at the end of the sentence).
A screenshot of the SyncWords GUI of the configuration of the output. This needs to point to the relevant packager/origin. I have selected the Destination Type of AWS MediaPackage, the AWS Media Package Version as V2, entered in the Endpoint URL and set the Buffer to 30 before clicking the Save button.

Figure 6: Screenshot of SyncWords’ “HLS Output” settings.

  • Click “Save” (the button will change to “Saved”). If any mandatory settings are missing, a warning saying “Some data is invalid.” will be shown and a red circle with an exclamation mark will indicate where it is missing. You may need to scroll up to find it.
  • Your SyncWords service is now configured and ready to be started.

Step 3: Configure your MediaLive channel to push HLS into your SyncWords channel:

  • Create your MediaLive Input and Channel.
  • Add an HLS Output group and select “Create parameter” under the “Credentials (optional)” section. Use the SyncWords auto-generated Input Media settings for URL, Username and Password values from Step 2. Give the parameter a name. Once the MediaLive channel is created this parameter will be stored automatically in AWS Systems Manager Parameter Store. The credentials are needed for the MediaLive channel to be accepted as a source into the SyncWords channel you created in Step 2.
A screenshot of the MediaLive Output Group in the AWS Console where the SyncWords input details are defined. This includes the source URL, username and password as provided by the SyncWords system. For password I have the Create parameter selected with the name of the channel, SyncWords-Channel. The name will vary, depending on what you originally called your SyncWord channel.

Figure 7: Screenshot of AWS Elemental MediaLive “HLS output group” settings.

  • Continue setting up your HLS group and HLS outputs in your MediaLive channel. Click the Create channel button once configured.

Step 4: Start services and monitor:

  • Start your MediaLive channel and click “Start Service” in your SyncWords service setup. Wait for your MediaLive channel state to change to “Running” and for the connection status in the SyncWords GUI to state it is “Connected”. Note: The start-up phase can take a few minutes.
  • Monitor your output by clicking the “Preview” link in your MediaPackage Origin endpoints section. When your channel is running you should be able to see a list of subtitles and audio languages. Select from your captions and dubbed language tracks in your preferred client (typically located in the bottom, right hand corner of your playback client).

Note: If you require standalone subtitle files (SRT or VTT) from the SyncWords live workflow, these will be available directly in the SyncWords Live dashboard once the live service has been archived.

Clean up

Stop or remove any MediaLive and SyncWords channels not in use to avoid unnecessary resource usage and costs.

Conclusion

There is a direct need for more, and better-quality, accessibility services like captions and subtitles. You can enrich your content with the creation of live captions and dubbed audio tracks through the use of AWS Elemental Media Services and SyncWords AI-powered captions and translations service.

Try it out for yourself and experience the efficiency of our implementation of transcription, translation and dubbing in a live broadcast cloud environment. You’ll also note the high accuracy of the captions and translations, and how timely these are presented to the end user.

Contact an AWS Representative to know how we can help accelerate your business.

Further reading

Hein Nergaard

Hein Nergaard

Hein Nergaard is a Senior Specialist Solutions Architect at AWS. Hein has 24+ years of experience in Media and Broadcasting. He advises Media and Entertainment customers on how to implement their cloud and on-premises solutions, specialising in AWS Elemental Media Services.