AWS Media Blog

Getting started with optical character recognition in AWS Elemental Live and AWS Elemental MediaLive

What is OCR?

Optical character recognition (OCR) is the ability for software to recognize characters in an image and to convert those characters to text. This conversion allows the characters to be recognized as specific letters and numbers and therefore to be manipulated along with any other text. Recent areas of research in the use of artificial intelligence (AI), computer vision, and pattern recognition have enhanced OCR to allow software to recognize not only individual characters—the letter A, for example—but also to recognize complete words. Recognizing entire words dramatically speeds up the accuracy of OCR in a near-real-time environment.

AWS Elemental Live, which encodes live video on premises for events and 24/7 streams, and AWS Elemental MediaLive, a broadcast-grade live video processing service, now use OCR technology to facilitate conversion of DVB-Sub or SCTE-27 captions and subtitles input to WebVTT. This can be used in HLS or DASH outputs in AWS Elemental Live and HLS outputs in MediaLive. Customers no longer have to use an external system or service to facilitate this conversion—with AWS Elemental Live and MediaLive, the conversion can be done at the same time as the video encoding.

This blog focuses on how to facilitate and use OCR on AWS Elemental Live and MediaLive to generate text-based subtitles in these new over-the-top (OTT) output encoding formats like HLS or DASH.

Why do customers need OCR?

DVB-Sub captions and subtitles have long been a popular and effective way for broadcasters to include captions and subtitles. However, DVB-Sub captions and subtitles aren’t supported in the more recent OTT output formats such as Apple HLS. HLS does support WebVTT captions and subtitles, which is a text-based format.

This is where OCR comes into play. It can convert both DVB-Sub and SCTE 27 image-based captions and subtitles into the WebVTT captions and subtitles that HLS and DASH formats support. And by using AI and pattern recognition, OCR can perform the conversion in near real time.

To perform an OCR conversion, you must specify the language of the captions and subtitles. OCR pattern recognition works most effectively when the language is identified because different languages have different combinations of letters. In addition, many languages have characters that are unique to that language. When the OCR tool encounters one of these characters, it will recognize the character (or an entire word that contains this character) much faster when the language is known.

How to use OCR in AWS Elemental Live

In one AWS Elemental Live event, you can convert any number of source captions and subtitles to WebVTT. AWS Elemental Live supports more than 100 languages (AWS Elemental Live Supported OCR languages).

To learn how AWS Elemental Live works and how to set up a live event, please refer to https://docs.aws.amazon.com/elemental-live/latest/ug/how-live-works.html.

To use OCR, use software version 2.22.0 GA or higher and opt to install the OCR package as part of the installer process:

Setting up OCR Conversion in Versions 2.22.0 to 2.23.0

Setting up OCR Conversion in Versions 2.23.1 and later

Once you have this, follow these steps:

Step 1: set up live event input

  • Start creating or editing an AWS Elemental Live event.
  • In the event, add an input for the source that contains the DVB-Sub or SCTE-27 captions and subtitles.
  • In that input, choose Advanced. Choose Add Caption Selector, then for Source, choose DVB-Sub source or SCTE-27 source, as appropriate. Enter the PID that holds the captions and subtitles, or enter the language code for the captions and subtitles that you want AWS Elemental Live to look for.

AWS Elemental Live Input Caption Selector - Source drop-down selected to "DVB-Sub", sample PID 563

Step 2: set up the live event output

Now move over to the output side.

  • Find the HLS or DASH output where you want to set up the WebVTT captions and subtitles encode.
  • In the Streams section, choose the + sign beside Caption to add a captions and subtitles section.
  • In Caption Source, choose the selector that you created in the input section.
  • In Destination Type, choose  The Optical Character Recognition field will appear.
  • In Optical Character Recognition, choose the language that applies to the captions and subtitles.

AWS Elemental Live Output Caption configuration : Caption Source - Caption Selector 1 Destination Type - WebVTT

  • Complete the Advanced fields to include optional language data in the encode.

(Screenshots from The Tick, available on Prime Video)

Screenshot from The Tick : - Tick talking to Arthur, English DVB-Sub subtitle superimposed at the bottom

Image-based DVB-Sub subtitles at the input (AWS Elemental Live and AWS Elemental MediaLive, English)

Screenshot from The Tick : - Tick talking to Arthur, English WebVTT subtitle superimposed at the bottom

Text-based WebVTT subtitles at the output (AWS Elemental Live and AWS Elemental MediaLive, English) 

How to use OCR in AWS Elemental MediaLive

In a MediaLive channel, you can convert a maximum of three sets (different languages) of source captions and subtitles to WebVTT.

MediaLive supports conversion of six languages (Dutch, English, French, German, Portuguese, and Spanish), with plans to add more languages over time (AWS Elemental MediaLive supported OCR languages).

Log in to the AWS console using your AWS account and access the MediaLive service. You will need to create an input and channel in MediaLive. (https://docs.aws.amazon.com/medialive/latest/ug/how-medialive-works-channels.html)

Step 1: set up live input

  • Create an input for the source that contains the DVB-Sub or SCTE-27 captions and subtitles. Start creating or editing a channel. Attach the input that you created.
  • In the Input Attachments, choose the input that contains the DVB-Sub or SCTE-27 captions and subtitles. In the General Input settings section, choose Add Caption Selectors.
  • For Selector Settings, choose DVB-Sub source or SCTE-27 source, as appropriate.
  • Identify the specific language to extract from the source. Either enter the PID or the three-digit language code that applies to the captions and subtitles language.
  • In OCR Language, choose the language of the captions and subtitles. In the output, MediaLive will use the OCR library for this language to perform the conversion to WebVTT

AWS Elemental MediaLive Input Caption Selector configuration : - Captions Selectors 1 - Caption Selector Name "Subtitles" - Selector Settings submenu : - DVB Dub source dropdown selected - PID 460 - OCR Language SPA dropdown selected - Language Code blank

  • Complete other fields in this caption selector as appropriate.

(Screenshots from The Tick, available on Prime Video)

Screenshot from The Tick : - Tick

Image-based DVB-Sub subtitles at the input (AWS Elemental Live and AWS Elemental MediaLive, Portuguese)

Screenshot from The Tick : - Tick

Text-based WebVTT subtitles at the output (AWS Elemental Live and AWS Elemental MediaLive, Portuguese) 

Step 2: set up and edit the live channel

  • Attach the previously configured input to the live channel that you are creating and editing.
  • Choose an existing output group from HLS or AWS Elemental MediaPackage, which prepares and protects your video for delivery over the internet, or create a new output group. Find the output where you want to set up the WebVTT captions and subtitles encode. Please note that WebVTTs need to have their own output. They cannot be mixed with video or audio.
  • In the Stream settings section, choose the Add caption button at the top of the section, then choose Create a new caption encode.

AWS Elemental MediaLive live channel Stream Settings Add Video dropdown Add Audio dropdown Add Caption dropdown, "Create a new caption encode selected"

  • Choose the captions and subtitles encode button on the left side. In Caption Selector Name, choose the selector that you created in the input section.
  • In Caption Settings, choose WebVTT destination. More fields will appear.
  • Ignore Style Control because it doesn’t apply when converting from DVB-Sub or SCTE-27. Optionally, complete the Additional settings to insert language data in the encode.

AWS Elemental MediaLive live channel Stream Settings Add Video dropdown Add Audio dropdown Add Caption dropdown Caption 1 configuration options: - Caption Description Name - caption_3hnn2 - Caption Selector Name dropdown - Subtitles - Caption Settings dropdown - WebVTT destination selected - Style Control dropdown - NO_STYLE_DATA selected

(Screenshots from The Tick, available on Prime Video)

Screenshot from The Tick : - Tick talking to Arthur, Norwegian DVB-Sub subtitle superimposed at the bottom

Image-based DVB-Sub subtitles at the input (AWS Elemental Live only, Norwegian)

Screenshot from The Tick : - Tick talking to Arthur, Norwegian WebVTT subtitle superimposed at the bottom

Text-based WebVTT subtitles at the output (AWS Elemental Live only, Norwegian) 

Screenshot from The Tick : - Tick talking to Arthur, Russian DVB-Sub subtitle superimposed at the bottom

Image-based DVB-Sub subtitles at the input (AWS Elemental Live only, Russian)

Screenshot from The Tick : - Tick talking to Arthur, Russian WebVTT subtitle superimposed at the bottom

Text-based WebVTT subtitles at the output (AWS Elemental Live only, Russian) 

Conclusion

In this post, we presented the new optical character recognition feature released in both AWS Elemental Live software and the AWS Elemental MediaLive cloud-based service. We explained how to configure events and channel inputs and outputs and showed some output examples in various languages and character sets. For further details on these topics, please get in touch with your AWS Elemental account team.

Learn more

If you would like to explore more live streaming workflows built on AWS, visit Live streaming on AWS. If you are interested in additional applications for video streaming, processing, and delivery using AWS Services, visit Media Services on AWS.

Pedro Feliciano

Pedro Feliciano

Pedro Feliciano is a Senior Solutions Architect for AWS Elemental. He has spent the last couple of decades working in the Broadcast and OTT Media industry on a broad range of video processing and distribution solutions, helping customers deploying them on-premises, in the cloud or in hybrid environments.