AWS for M&E Blog
Configure live captions with SyncWords and AWS Elemental Media Services
As a content creator, producer, or broadcaster it can be daunting to provide quick, and accurate transcriptions, dubbing or translations, for live streaming. However, I’ll discuss how to configure different services that could be just the cutting edge you need, saving you both time and money.
Introduction
Consumers all over the world have access to more content than ever before, including content created in foreign languages. Some consumers rely on accessibility services like captions/subtitles (we’ll refer to collectively as just captions) which were originally designed for the hard of hearing, and audio descriptions for people with limited vision.
Beyond this, captions are also useful in many environments, regardless of hearing or eyesight (for example in public places where audio is not suitable). Legislation is also being put in place to mandate broadcasters to provide such accessibility services for more of their content.
There are many workflows and solutions for video on demand (VOD) content to provide captions and foreign language translations. Now there is also a new, straightforward way to do the same for live streaming without much heavy lifting. I’ll explain how to configure Amazon Web Services (AWS) Elemental services and the live captioning service (from SyncWords) to enable automatic, AI-generated live captions and dubbing, including translations into multiple languages.
A challenge content creators, providers and broadcasters face with live broadcasts is to generate accurate and timely captions as the time from camera capture to screen presentation is very fast. One way to do this today is to have transcribers listening to the audio, type what is being said and then present that into the media workflow. This can be costly, subject to human errors, and cause difficulties synchronizing the text with the audio.
During live broadcasting the presentation of the captions typically lags between four and eight seconds behind the video/audio, while words and parts of sentences may be misspelled or missed completely. These issues cause bad end user experiences. One example would be live news where the presenter quickly moves from one story to another. If the accessibility service (for example, captions) is several seconds delayed it may present the text about a story several seconds after the video has moved onto the next news story.
Timing and accuracy are key elements to provide a good experience for the users of accessibility services. This is especially true for mobile viewers who consume live streaming with captions and subtitles on the go and in noisy environments.
In addition to timing and accuracy challenges, broadcasters often encounter significant hurdles when inserting live captions into their broadcasts. It often requires specialized hardware upstream from the video contribution stage. It may also rely on software-based encoders to insert user data into transport streams according to specific TV protocols such as EIA-608, Teletext, digital video broadcasting (DVB) subtitling, and others.
The solution
However, there’s a solution: AWS Elemental MediaLive and AWS Elemental MediaPackage with SyncWords offer a streamlined approach to achieving accurate, synchronized, and translated captions. By leveraging the structure of a live HTTP live streaming (HLS) manifest, broadcasters can seamlessly integrate AI capabilities into their live streams. This eliminates the need for cumbersome hardware and streamlines the entire process.
I’ll describe how to configure a cloud-centered workflow for live captions based on the use of AWS Elemental Link (encoding devices), AWS Elemental MediaLive, SyncWords’ AI Captioning Service and AWS Elemental MediaPackage.
Prerequisites
The workflow uses MediaLive and MediaPackage services and I have assumed you are familiar with the setup of these services using the AWS Console. If this is not the case please refer to Getting started with AWS Elemental MediaLive and Getting Started with AWS Elemental MediaPackage.
You’ll also want to confirm you have an account with SyncWords. Contact SyncWords directly or purchase through the AWS Marketplace.
The workflow overview depicts AWS Elemental Link, which is a small hardware device encoder ideal for transporting video with high resiliency and built-in security, with video encryption and key rotation. In this example the contribution source feed is connected to Link which encodes and pushes your source stream directly into MediaLive over an ethernet connection. It is not mandatory to use a Link unit, but it is a great unit to use, as it is plug-and-play to get your source visible as an input in MediaLive. Other inputs accepted by MediaLive can be found in the MediaLive User Guide section: Setup: Creating inputs.
Configuration
This example workflow consists of MediaLive, SyncWords and MediaPackage services. Log into your AWS Management Console and access AWS Elemental MediaPackage to start.
Step 1: Start by creating a channel in MediaPackage (v2 is used here):
- Create you Channel Group.
- Then create your Channel. Note that when creating your MediaPackage v2 channel you will need to create a custom channel policy to allow the stream from SyncWords access. Instructions on how to define your policy for Publishing HLS streams from SyncWords Live to your AWS MediaPackage v2 Channel.
- Within your channel, create an HLS Origin endpoint by clicking the “Create endpoint” button. This is the endpoint the stream from SyncWords will push to.
- On the “Settings” page of your endpoint you will find the MediaPackage HLS ingest endpoint URL needed to configure your SyncWords HLS Output in Step 2. Make a note of it.
- For your HLS Origin endpoint; Under “Segment settings”, validate the “Use audio rendition group” box is selected.
Note: Setup in MediaPackage v1 is different, as you do not set up a custom channel policy. Instead, use the URL, Username and Password from your MediaPackage v1 HLS Ingest endpoint when configuring your SyncWords HLS Output in Step 2.
Step 2: Configure your SyncWords channel:
- Log into your account with SyncWords.
- Navigate to Services or Events and click “Create Service” or “Schedule an Event”. I’m using the Service setup here, as shown in Figure 2.
- Define the Service Name and click “Create Service”.
- Create an input endpoint for MediaLive to stream HLS to as an “HLS Push” type. This will automatically generate a unique URL endpoint, username and password to access the SyncWords service endpoint.
- Note: You will need the URL, username and password settings when configuring MediaLive in Step 3. Once you are streaming from MediaLive into this endpoint you should see the Connection Status change from “Not Connected” to “Connected”.
- In the “Transcript” section (Figure 4), first set the Transcript Type to “Automated (AI)”. Then select the Speech Engine and the Source Language to be used for the recognition of your source audio. If required, use the Add Dictionary for any custom terminology the speech engine may have problems with (for example, names and abbreviations).
- Select any additional settings under “More Options”. Note: Options may vary depending on the Speech Engine selected.
- In the “Translations” section (Figure 5) add the languages you wish to translate into. This defines which output languages will be presented to the end user client as captions. Select your preferred engine to perform the translations from the MT Engine drop down list.
- Select the “Audio” box for those languages you want dubbed audio tracks created for, and select your required dubbed voice settings (Dialect, Gender, Speaker and Speaking Rate).
- Set the volume of the original audio to be mixed in with the dubbed tracks as a percentage between 0-50.
- To increase accuracy of automatic translations, add custom glossaries and fill in the fields; source language, target language and custom glossary. Glossaries may be needed for terms that may be difficult to translate or do not have a translated version in the target language. A glossary preset can also be created. How to do this is explained in How to Create a Translation Glossary.
- Create your HLS output in the “Outputs” section (Figure 6). Select “AWS MediaPackage” and “v2” from the drop-downs.
- Copy the HLS ingest endpoint URL from your MediaPackage v2 Channel “Settings” page (see Step 1).
- Set your buffer size. Note: For streams including translations, at least a full sentence is needed before a translation can be processed due to different languages placings words in different orders (such as, in German the verb is often at the end of the sentence).
- Click “Save” (the button will change to “Saved”). If any mandatory settings are missing, a warning saying “Some data is invalid.” will be shown and a red circle with an exclamation mark will indicate where it is missing. You may need to scroll up to find it.
- Your SyncWords service is now configured and ready to be started.
Step 3: Configure your MediaLive channel to push HLS into your SyncWords channel:
- Create your MediaLive Input and Channel.
- Add an HLS Output group and select “Create parameter” under the “Credentials (optional)” section. Use the SyncWords auto-generated Input Media settings for URL, Username and Password values from Step 2. Give the parameter a name. Once the MediaLive channel is created this parameter will be stored automatically in AWS Systems Manager Parameter Store. The credentials are needed for the MediaLive channel to be accepted as a source into the SyncWords channel you created in Step 2.
- Continue setting up your HLS group and HLS outputs in your MediaLive channel. Click the Create channel button once configured.
Step 4: Start services and monitor:
- Start your MediaLive channel and click “Start Service” in your SyncWords service setup. Wait for your MediaLive channel state to change to “Running” and for the connection status in the SyncWords GUI to state it is “Connected”. Note: The start-up phase can take a few minutes.
- Monitor your output by clicking the “Preview” link in your MediaPackage Origin endpoints section. When your channel is running you should be able to see a list of subtitles and audio languages. Select from your captions and dubbed language tracks in your preferred client (typically located in the bottom, right hand corner of your playback client).
Note: If you require standalone subtitle files (SRT or VTT) from the SyncWords live workflow, these will be available directly in the SyncWords Live dashboard once the live service has been archived.
Clean up
Stop or remove any MediaLive and SyncWords channels not in use to avoid unnecessary resource usage and costs.
Conclusion
There is a direct need for more, and better-quality, accessibility services like captions and subtitles. You can enrich your content with the creation of live captions and dubbed audio tracks through the use of AWS Elemental Media Services and SyncWords AI-powered captions and translations service.
Try it out for yourself and experience the efficiency of our implementation of transcription, translation and dubbing in a live broadcast cloud environment. You’ll also note the high accuracy of the captions and translations, and how timely these are presented to the end user.
Contact an AWS Representative to know how we can help accelerate your business.