AWS for M&E Blog

Audio description mixing now available with AWS Elemental MediaConvert

Introduction

AWS Elemental MediaConvert, a cloud-based video transcoding service from Amazon Web Services (AWS) now features the ability to create Broadcast Mix audio description (AD) outputs from Receiver Mix AD sources. Interpreting control track data in the source, MediaConvert dynamically mixes AD commentary into audio outputs during the transcode process.

A recent consultation by the UK’s communications regulator (Ofcom) highlights the need to increase and enhance accessibility services for over-the-top (OTT) platforms. Our customers asked us to simplify workflows to meet their growing accessibility requirements. This feature expands access to blind or partially sighted audiences, and simplifies AD creation for broadcasters.

MediaConvert transcodes file-based content into live stream assets quickly and reliably by combining advanced video and audio capabilities with a simple user interface and pay-as-you-go pricing. With cloud-based transcoding, you can focus on delivering compelling media experiences and worry less about maintaining video processing infrastructure.

What is audio description?

Audio Description provides spoken commentary that describes the visual elements of a scene to its audience. This includes facial expressions, body language, or subtle visual cues that are integral to the storyline. These elements may be missed by partially sighted audiences. Commentary is delivered during passages of reduced dialogue to prevent the overlap of multiple voices. In addition, the main program audio volume may be reduced to make the commentary more comprehensible using an audio effect commonly known as ‘ducking’.

How is an audio description delivered?

AD is usually delivered in one of two formats. Broadcast Mix, also known as Pre-Mixed Audio Description, is the most compatible. AD is mixed into the source or added during transcoding for OTT workflows. It presents users with a selectable AD option in the language menu. Receiver Mix, also known as Studio Mix, is commonly seen in direct-to-home (DTH) workflows such as cable or terrestrial television. The TV or set-top box (STB) directly mixes a separate AD track with the original program audio. This requires significant device compatibility testing but provides granular control of the AD, such as the ability to choose the overall volume level of AD relative to the program audio. With this new feature release, Receiver Mix can now be converted to Broadcast Mix using MediaConvert.

What is an AD receiver mix track?

Defined in WHP198, a Receiver Mix AD track is a stereo audio pair that contains descriptive commentary in the left channel, and a control track in the right channel. The control track signals how the main program audio should adjust to accommodate the AD track, including ‘ducking’. The control track encodes as audio and sounds like a loud ‘warbling’ tone, so it can be very unpleasant to listen to. Be sure to mute this track in the final mix!

Control data is created during program production and conveys editorial decisions that define the listening experience. For example, the fade value defines how much the program audio should be reduced for the AD to be clearly audible. Louder passages require further reduction than quiet ones. This fade value can be sampled many times per second to create a gradual fade, providing a pleasant listening experience by avoiding abrupt changes in audio levels. This conversely reduces the amount of available time for a passage of commentary. Pan data determines where to position AD in the sound field, for example “the left ear”.

How do I create a broadcast mix AD output from a receiver mix source in MediaConvert?

Consider the following scenario. A source MXF file contains a video track with separate left and right main program audio tracks. The AD Receiver Mix is provided as an external sidecar WAV file external_audio.wav. The desired output contains an original stereo version and a Broadcast Mix stereo AD version.

Figure 1 depicts two sources on the left. Input 1 contains video, program audio left and program audio right. Audio 2 contains audio description audio and data channels. The right box depicts the desired output containing a program Stereo Mix and an audio description Stereo Mix.

Fig. 1: An example stereo AD workflow.

Follow these steps to implement the previous scenario. The following assumes you have a basic understanding of how to set up and configure a MediaConvert job.

Configuring the input:

  1. Open the AWS Elemental MediaConvert console.
  2. In the Navigation pane, choose Jobs, Create Job.
  3. For Input file URL, enter the location of your source file.
  4. In the Audio selectors pane, choose Add audio selector.
    1. For Audio Selector 1 Selector type, choose Track. For Tracks, enter 1,2.
    2. For Audio Selector 2 Selector type, choose Track. For Tracks, enter 1,2.
      1. Select External file and enter the Amazon S3 location of your AD source.
  5. Choose Add audio selector group and add both audio selectors.

Figure 2 depicts where to define the external .wav file containing Receiver Mix audio description.

Fig. 2: Setting up audio selectors and audio selector group.

Configuring the output:

  1. In the Output groups pane, choose Add, File group.
  2. For Destination, enter an Amazon S3 output destination.
  3. In the Outputs pane, choose Details.
  4. In the Encoding settings pane, choose Add audio.
  5. For Audio 1 Audio source, choose Audio Selector 1.
  6. For Audio 2 Audio source, choose Audio Selector Group 1
    1. Select Advanced, turn on Manual audio remixing.
    2. For Audio description audio channel, enter 3.
    3. For Audio description data channel, enter 4.
    4. For Input channels, choose 4.
    5. For channel mapping, enter 0dB to pass through audio, and -60dB to mute a channel. Refer to the following screenshot.
  7. For Video Max bitrate, enter any valid value in bits/s. This is required to create the job.
Figure 3 depicts where to define the AD channels and set up remixing as described prior to the figure.

Fig. 3: Setting up AD and remix controls.

The following image shows the audio ducking in action. The individual components are separated for illustration purposes, as the final output will be mixed. The ‘Original’ track highlighted on top has a consistent amplitude. The ‘Original-dipped’ track shows how the program audio gain is reduced when ‘Commentary’ is present.

Figure 4 is a visual waveform that depicts the program audio reduced during passages of narration.

Fig. 4: Demonstrating how the program audio levels are adjusted to mix in the commentary.

Conclusion

In this post, I described how to create a simple audio description Broadcast Mix in MediaConvert. It is possible to create multiple AD versions for stereo and Dolby, use embedded or external audio input types, and combine audio selectors into audio selector groups for greater customization. This feature expands accessibility, allowing blind or partially sighted audiences to enjoy more content.

You can read more about audio descriptions in the MediaConvert documentation. I encourage you to head to the AWS console and give it a try!

Ben Marshall

Ben Marshall

Ben Marshall is an Enterprise Account Engineer at AWS Elemental, specialising in Media Services. He helps Enterprise customers leverage advanced media technologies to enhance their streaming capabilities, and successfully deliver high profile events.