AWS Media Blog

.smil using AWS Elemental MediaPackage VOD

Multi-language and multi-subtitle support is here

New enhancements to AWS Elemental MediaPackage VOD allow you to create custom labeling of your audio and subtitle/caption tracks so the name and language information passes through to the end user. This is now supported when ingesting your assets via a .smil file, which is a simple XML document designed for presenting multimedia. This post provides an advanced tutorial of the new enhancements, how to configure them, and what to expect in your HLS and DASH-ISO outputs. This is with specific regard to .smil ingestion, not HLS ingestion, which has its own required configuration.

Avoiding the Undefined

Customers moving thousands of media assets to Amazon Simple Storage Service (Amazon S3), often find some metadata inconsistencies between their videos, and within a single title’s mp4 renditions. The dreaded ‘undefined’ is often seen when selecting VOD subtitles/captions and even switching between AAC and AC3 audio where the language is not defined. Updating the language metadata inside the renditions themselves requires a new file to be generated using ffmpeg or similar, overriding the old file. This process of sanitizing an entire library can be tedious and costly. If you generate your ABR ladder of mp4 files using AWS Elemental MediaConvert, pay close attention to the languageCodeControl and customLanguageCode settings in the documentation to ensure that the language is set correctly in your outputs.

New Additions to the .smil

The three main additions to the .smil are ‘audioName’, ‘subtitleName’, and ‘systemLanguage’. Let’s dig into each one and see what they do. We are going to assume we have an mp4 video that has three AAC audio tracks: Track 1 is English, Track 2 is Spanish, and Track 3 is French. None of the languages have been defined via metadata inside the mp4. We have three matching .srt sidecar subtitle/caption files.

The following image shows the MediaInfo for a sample file, showing one video track and three audio tracks.

Creating the .smil

On the <video> element, add audioName=“English,Spanish,French”. Since there are three audio tracks, MediaPackage uses our labels in the order that we write them, from left to right. MediaPackage acts as a passthrough for whatever text you enter. In the HLS output, this corresponds to the NAME attribute on the #EXT-X-MEDIA:TYPE=AUDIO lines in the master manifest. The first audio track has DEFAULT=YES,AUTOSELECT=YES added to the line, all others are NO. In DASH-ISO, this corresponds to the <Label> attribute underneath each audio AdaptationSet node.

On the <video> element add systemLanguage=”eng,spa,fra”. These system languages should follow the 3-digit ISO 639-2 specification. ISO 639-1 2-digit codes are not as reliable among many HLS players. For HLS, MediaPackage passes these through to the LANGUAGE attribute on the #EXT-X-MEDIA:TYPE=AUDIO lines in the master manifest. For DASH-ISO, they pass through to the “lang“ parameter on the <AdaptationSet> node of the audio track.

On each of the three <textstream> elements for the subtitles/captions, we add an additional systemLanguage=“” with the corresponding language code inside the quotes. For HLS, this passes through to the LANGUAGE attribute on the #EXT-X-MEDIA:TYPE=SUBTITLES lines in the master manifest. For DASH-ISO, the language codes pass through to the “lang“ parameter on the <AdaptationSet> node of the subtitle track.

On each of the three <textstream> elements for the subtitles/captions we add a single name: subtitleName=“English”, subtitleName=“Spanish”, or subtitleName=“French”. For HLS, this passes through to the NAME attribute on the #EXT-X-MEDIA:TYPE=SUBTITLES lines in the master manifest. The first subtitle/caption track has DEFAULT=YES,AUTOSELECT=YES added to the line, all others are NO. For DASH-ISO, the names pass through to the <Label> attribute underneath each subtitle/caption AdaptationSet node.

Final Touches

It’s critical to update your HLS Packaging Group to include the “Use audio rendition groups” feature. Without enabling that feature, there is no language field in the manifest to override, so the player cannot deliver proper information to the viewer. You can read more about that in the documentation.

The following image shows the Use audio rendition groups as selected.Use audio rendition groups is checked

The following image displays the expected output in a sample player showing three audio languages and three subtitle languages.
MediaInfo showing one video and three audio tracks

This .smil format is also helpful when you have multiple tracks of the same language but you’d like to separate out the film’s “English” from the “Director’s Commentary English”. It’s also important to note that if these settings are used, they override any settings that appear in the mp4’s native metadata.

The following is our finished sample .smil file. Have fun exploring these new features and improving your end users’ viewing experience.

Sample .smil

<?xml version="1.0" encoding="utf-8"?>
<smil>
  <body>
    <switch>
      <video name="example_1080.mp4" systemLanguage="eng,spa,fra" audioName="English,Spanish,French"/>
      <video name="example_720.mp4" systemLanguage="eng,spa,fra" audioName="English,Spanish,French"/>
      <video name="example_540.mp4" systemLanguage="eng,spa,fra" audioName="English,Spanish,French"/>
      <video name="example_360.mp4" systemLanguage="eng,spa,fra" audioName="English,Spanish,French"/>
      <textstream src="english.srt" systemLanguage="eng" subtitleName="English"/>
      <textstream src="spanish.srt" systemLanguage="spa" subtitleName="Spanish"/>
      <textstream src="french.srt" systemLanguage="fra" subtitleName="French"/>
    </switch>
  </body>
</smil>