Detect silent audio tracks in VOD content with AWS Elemental MediaConvert

Introduction

In file-based video processing workflows, content providers often standardize on the mezzanine file format used to share content with their customers and partners. When sharing video-on-demand (VOD) content, produced mezzanine files often have same number of audio tracks regardless of how many are in the actual content. For example, MXF media assets may be produced with eight mono audio tracks for content that contains only 2.0 audio, so only two tracks are actually used while the remaining tracks are filled with silent audio.

To build an automated video processing workflow that makes intelligent decisions based on audio, we need to detect silent audio tracks and their position in the source mezzanine assets. From the example above, if six audio tracks are identified as silent, the asset would be processed as a stereo 2.0 asset. On the other hand, if six tracks have audio and two are silent, the asset should be sent to a 5.1 workflow for processing.

In this post, we will create a workflow using AWS Elemental MediaConvert to analyze audio tracks of media assets and measure their loudness. The workflow is automated by AWS Lambda functions and triggered by file uploads to Amazon S3.

Loudness measurement in MediaConvert

The Audio Normalization feature of MediaConvert makes it easy to correct and measure audio loudness levels, supporting the ITU-R BS.1770-1, -2, -3, and -4 standard algorithms. MediaConvert configuration allows the selected algorithm to only produce loudness measurements. It also allows logging loudness levels and storing these logs in S3.

We will be using MediaConvert to analyze loudness levels of the first 60s of media source assets. The workflow is flexible, so we can analyze more than 60s if required. The produced loudness logs provide the Input Max Loudness value for each audio track that we will compare to a threshold of -50 dB. If the Input Max Loudness value of an audio track is lower than -50 dB, that track is considered silent audio.

Solution overview

The workflow diagram for this solution is shown below:

The workflow works as follows:

A new source asset is uploaded to an Amazon S3 bucket.
S3 triggers the first AWS Lambda function (LambdaMediaConvert).
LambdaMediaConvert first runs MediaInfo on the source in order to determine the number of audio tracks, pushes a new job to MediaConvert, and stores job data in Amazon DynamoDB.
MediaConvert measures the Loudness of each audio track and saves Loudness logs into an S3 output bucket. As soon as the job completes, MediaConvert sends a “MediaConvert Event” to Amazon CloudWatch.
Using an Events rule, CloudWatch filters on “MediaConvert Event” and triggers the second Lambda function (LambdaLoudnessResults).
LambdaLoudnessResults collects Loudness logs from S3 to determine silent audio tracks by comparing Loudness values to a specific threshold, then updates DynamoDB with the results.

All steps required to build this workflow on AWS are explained in the post.

Step 1: Create a MediaConvert job template

The job template will be used by the function we will create called “LambdaMediaConvert” to build loudness measurement job settings. To create the MediaConvert template, log into the AWS Management Console, go to the MediaConvert service, and select the region you would like to work in.

Note: All AWS services in this workflow must run in the same region.

In the MediaConvert console, chose Job templates from the menu on the left and click Create template.
Enter a template name. This name will be referenced in LambdaMediaConvert (Step 6).
In Inputs section, click on Add button to add new inputs:
- Under Video selector -> Timecode source, select “start at 0”
- In Audio selector 1 -> Selector type, select “Track” and enter “1”

- Click on Add input clip and enter “00:01:00:00” for End timecode. This timecode corresponds to 1 minute, which means only the first 60 seconds of the source file will be processed by the template. You can adjust the End timecode value for your needs, or skip the Input clips configuration in order to analyze the loudness of the complete source file.

In Output groups, click on Add and select File group.
- Click on Output 1 and set:
  - Name modifier: “_240p”
  - Extension: “mp4”
  - Please note that we have to output a video in the MediaConvert job, so we will configure low resolution and bitrate video settings to reduce processing time and cost.
  - Under Encoding settings -> Video, set Resolution (w x h) to 320×240 and Bitrate (bits/s) to “200k”.
  - Under Encoding settings, click on Audio 1 and expand Advanced.
  - Enable Audio normalization and set:
    - Algorithm: “ITU-R BS.1770-2: EBU R-128”
    - Algorithm control: “Measure only”
    - Loudness logging: “Log”
    - Peak calculation: “None”

In Output groups click on Add and select File group
- Click on Output 1 and set:
  - Name modifier: “_240p”
  - Extension: “mp4”
  - Please note that we have to output a video in the MediaConvert job, so we will configure low resolution and bitrate video settings to reduce processing time and cost.
  - Under Encoding settings -> Video, set Resolution (w x h) to 320×240 and Bitrate (bits/s) to “200k”.
  - Under Encoding settings, click on Audio 1 and expand Advanced.
  - Enable Audio normalization and set:
    - Algorithm: “ITU-R BS.1770-2: EBU R-128”
    - Algorithm control: “Measure only”
    - Loudness logging: “Log”
    - Peak calculation: “None”

Click Create at the bottom of the page to create the job template.

Note: The job template is created intentionally with one audio input/output only. In the LambdaMediaConvert code, we will duplicate the audio configuration to match the number of audio tracks available in the source media asset.

Step 2: Create DynamoDB table

Open DynamoDB service console, expand the left-side menu, select Tables and click Create table:

Enter a Table name. This name will be referenced in LambdaMediaConvert.
For Primary key, enter “assetid”.
Click Create to create the table.

Step 3: Create S3 buckets

Open the Amazon S3 service console and create two buckets in the region you have chosen for the workflow. One bucket will be used for ingest and the second as the destination. Choose unique names for the ingest and destination buckets:

Click + Create bucket
Enter a bucket name for ingest bucket.
Select the “Region”.
Click the Create button.
Repeat for destination bucket.

Note: Keep these bucket names handy as we will need them when we configure the Lambda functions.

Step 4: Create Lambda layers

An AWS Lambda Layer is a ZIP archive that can contain libraries, a custom runtime, or other dependencies for Lambda functions. For this workflow, we will create one layer for Python 3.7 boto3 SDK and another for MediaInfo.

In order to package a layer, we need to make sure the dependency is placed in the right folder inside the ZIP file, as stated in Including Library Dependencies in a Layer. During run time, layers are extracted to “/opt” directory in the function execution environment.

4.1 Create Python 3.7 boto3 SDK layer

It is highly recommended that you use the latest version of the SDK when building Lambda functions to access the most recent features and API commands added to AWS services. In fact, the boto3 version included in Lambda Python Runtime is usually slightly behind. For more details, you can double check the AWS Lambda Runtimes page.

In order to create the boto3 layer ZIP file, the required steps are similar to the ones explained in AWS Lambda Deployment Package in Python. The only difference is that we are going to package dependencies (boto3) without the function code. In addition, boto3 will be placed under “/python” folder inside the ZIP file. Below are the “Bash” commands we will use to create the boto3 layer package.

Note: If you are a Mac OSX user who installed Python 3.7 using Homebrew, there is a known issue that might prevent “pip” from installing packages. A potential workaround is available on Stack overflow here. Pay special attention to delete the suggested “~/.pydistutils.cfg” file after boto3 is installed as it might break regular pip operations.

# make sure to use the right version of pip
$ pip3 --version
pip 19.1 from /usr/local/lib/python3.7/site-packages/pip (python 3.7)

# create working folders
$ mkdir -p boto3/python
$ cd boto3/python

# for OSX and Homebrew users, create ~/.pydistutils.cfg before running following command and delete it afterwards
$ pip3 install boto3 --target ./
…

# verify the installed version of boto3
$ ls
…
boto3-1.9.136.dist-info
…

# create the zip file in parent folder
$ cd ..
$ zip -r ../boto3.1.9.136.zip .

After the layer ZIP file is ready, we will create the Lambda layer. Open Lambda service console in the working AWS region, navigate to the left-side menu, and click on “Layers”. Click on “Create layer”, then upload the layer ZIP file created earlier and enter additional details. For the license, use the location “https://aws.amazon.com/apache-2-0/”. Finally, click on the “Create” button.

4.2 Create MediaInfo layer

MediaInfo is used to extract the number of audio tracks to be analyzed in the source file. We are using the code and procedure introduced in Extracting Video Metadata using Lambda and MediaInfo post. However, in this case, we will package MediaInfo as a dependency in a Lambda layer. Here is how to proceed:

Follow Step 1 in Extracting Video Metadata using Lambda and MediaInfo in order to compile MediaInfo for Amazon Linux.
Once MediaInfo executable is ready, place it in the “/bin” folder inside the layer ZIP file as follows:

Note: If you are using an older pre-compiled version of MediaInfo, make sure it has support of “JSON” output format. Otherwise, please consider compiling a new one.

# compiled mediainfo is placed in current working directory 

$ mkdir -p mediainfodir/bin
$ mv mediainfo mediainfodir/bin/
$ cd mediainfodir
$ zip -r ../mediainfo.zip .

Next, create the layer in Lambda console. You can use following link location for the license “https://mediaarea.net/en/MediaInfo/License”.

Step 5: Create IAM roles for MediaConvert and Lambda functions

When creating IAM roles, provide the minimum rights required for a service to perform its job.

5.1 Create IAM role for MediaConvert

All steps required to create IAM role for MediaConvert are explained in the user guide here. After the role is created, note the role ARN as it will be required below.

5.2 Create IAM role for Lambda

For simplicity, we will create one IAM role for both Lambda functions. The permissions required include: Read objects from the ingest S3 bucket, read and delete objects in the destination S3 bucket, store logs to CloudWatch logs, create job and get job template from MediaConvert, store metadata in DynamoDB table, and Lambda with IAM permission to pass a role to MediaConvert.

Open the IAM service console, click on Roles from left-side menu, and choose Create role.
Choose AWS service role type, and then choose Lambda service that will use the role and click on Next: Permissions.
Under Attach permissions policies, choose Create policy and select JSON tab. Note that Create policy will open a new tab in the browser.
Copy the below policy JSON in the editor with following modifications:
- Replace “INGEST_BUCKET” with S3 ingest bucket name you created in step 3.
- Replace “MC_ROLE_ARN” with the ARN of MediaConvert role created in 5.1.
- Replace “DYNAMODB_TABLE_ARN” with the ARN of DynamoDB table created in step 2.
- Replace “DEST_BUCKET” with S3 destination bucket name you created in step 3.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "iam:PassRole",
                "dynamodb:PutItem",
                "dynamodb:DeleteItem",
                "mediaconvert:GetJobTemplate",
                "dynamodb:UpdateItem",
                "mediaconvert:CreateJob"
            ],
            "Resource": [
                "arn:aws:mediaconvert:*:*:jobTemplates/*",
                "arn:aws:mediaconvert:*:*:presets/*",
                "arn:aws:mediaconvert:*:*:queues/*",
                "DYNAMODB_TABLE_ARN",
                "MC_ROLE_ARN",
                "arn:aws:s3:::INGEST_BUCKET/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "mediaconvert:DescribeEndpoints",
                "logs:CreateLogGroup",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor2",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::DEST_BUCKET/*"
        }
    ]
}

Click Review policy.
Enter a name and description for the policy, then choose Create policy.
Switch back to the previous tab in the browser where Create role / Attach permissions policies is still open.
Choose Filter policies and select Customer managed from the drop-down box.

Click on Filter policies to hide the drop-down box and select the policy name that we created above. (You can also search for the policy name using the Search area.)
Select Next: Tags, where you can optionally add a tag for workflow name.
Select Next: Review and enter Role Name. Here, we can optionally update the description before choosing Create Role.
Note the role name you used, as it will be required later.

Step 6: Create the first Lambda function: LambdaMediaConvert

6.1 Create Lambda Function

Open Lambda service console and choose Create a function. Alternatively, you can expand the left-side menu, choose Functions and click the Create function button.
Choose Author from scratch, input Function name and select “Python 3.7” for Runtime.
For Permissions, expand Choose or create an execution role. Select Use an existing role under Execution role, then select the role we created in 5.2 under Existing role. Finally, choose Create function.

6.2 Add Layers

In the Designer area, choose Layers under the function name.

Under Designer, find the Layers section. Choose Add a layer.
Select the boto3 layer created in 4.1 under Compatible layers, select the Version, then choose Add.
Repeat the same steps to add the MediaInfo layer.

Click on the Save button in upper righthand corner to save changes to the function.

6.3 Add S3 trigger

In the Designer area and on the left side under Add triggers, look up S3 and choose it.
The Configure triggers section will show up under the Designer area. Select the ingest bucket name for Bucket.
For Event type, select All object create events.
You can optionally control which files to trigger the workflow by specifying a Prefix for subfolder path within the bucket, as well as file extension (i.e. .mxf) under Suffix.
Make sure Enable trigger is checked and choose Add.
Save changes with the Save button.

6.4 Add the function code

In Designer area, choose the function name to show the Function code section and other configuration sections

In Function code editor, override the default code with the following code:

import uuid
import os
import subprocess
import boto3
import logging
import json
import copy

# The number of seconds that the Signed URL is valid
SIGNED_URL_EXPIRATION = 300

logger = logging.getLogger('boto3')
logger.setLevel(logging.INFO)


def lambda_handler(event, context):

    assetID = str(uuid.uuid4())

    sourceS3Bucket = event['Records'][0]['s3']['bucket']['name']
    sourceS3Key = event['Records'][0]['s3']['object']['key']
    sourceS3 = 's3://'+ sourceS3Bucket + '/' + sourceS3Key
    logger.info("S3 object source URI: {}".format(sourceS3))

    # Mediainfo
    signedUrl = get_signed_url(SIGNED_URL_EXPIRATION, \
        sourceS3Bucket, sourceS3Key)
    logger.debug("S3 Signed URL: {}".format(signedUrl))

    miOut = subprocess.check_output(
        ["/opt/bin/mediainfo", "--full", "--output=JSON", signedUrl]
    )
    mi = json.loads(miOut.decode('utf8'))
    logger.debug("Media Info Output: {}".format(json.dumps(mi)))

    # Audio silent detection using Mediaconvert
    audioCount = int( mi['media']['track'][0]['AudioCount'] )
    logger.info("Number of Audio tracks: {}".format(audioCount))

    if audioCount == 0:
        logger.warning("The source file has no audio tracks. Exiting...")
        return 0

    dest = 's3://' + os.environ['DESTINATION_BUCKET'] + \
        '/audio_logging/' + assetID + '/audio'
    logger.info("Destination path: {}".format(dest))

    # DynamoDB table name
    tableName = os.environ['DYNAMODB_TABLE_NAME']

    try:
        # Get MediaConvert endpoint and push the job
        region = os.environ['AWS_DEFAULT_REGION']
        mc_client = boto3.client('mediaconvert', region_name=region)
        endpoints = mc_client.describe_endpoints()

        # Load MediaConvert client for the specific endpoiont
        client = boto3.client('mediaconvert', region_name=region,
            endpoint_url=endpoints['Endpoints'][0]['Url'], verify=False)

        # Get Job_Template Settings
        jobTemplate = client.get_job_template(
            Name=os.environ['MC_LOUDNESS_TEMPLATE']
        )

        jobSettings = build_mediaconvert_job_settings(
            sourceS3, dest, audioCount, jobTemplate["JobTemplate"]["Settings"]
        )
        logger.debug("job settings are: {}".format(json.dumps(jobSettings)))

        mediaConvertRole = os.environ['MC_ROLE_ARN']
        jobMetadata = {
            "AssetID": assetID,
            "Workflow": "SilentDetection",
            'AudioCount': str(audioCount),
            "Source": sourceS3,
            "DynamoTable": tableName
        }

        # Push the job to MediaConvert service
        job = client.create_job(Role=mediaConvertRole, \
            UserMetadata=jobMetadata, Settings=jobSettings)
        logger.debug("Mediaconvert create_job() response: {}".format( \
            json.dumps(job, default=str)))

    except Exception as e:
       logger.error("Exception: {}".format(e))
       return 0

    # Store Asset ID and Media Info in DynamoDB
    dynamo = boto3.resource("dynamodb")
    dynamoTable = dynamo.Table(tableName)
    dynamoTable.put_item(
        Item={
            'assetid':assetID,
            'mediainfo':json.dumps(mi),
            'audiocount':audioCount,
            'source':sourceS3
        }
    )

    return 1

def get_signed_url(expires_in, bucket, obj):
    """
    Generate a signed URL
    :param expires_in:  URL Expiration time in seconds
    :param bucket:      S3 Bucket
    :param obj:         S3 Key name
    :return:            Signed URL
    """
    s3_cli = boto3.client("s3")
    presigned_url = s3_cli.generate_presigned_url(
        'get_object', Params={'Bucket': bucket, 'Key': obj}, \
        ExpiresIn=expires_in)

    return presigned_url


def build_mediaconvert_job_settings(source, destination, audio_count, \
    template_settings):
    """
    Build MediaConvert job settings based on provided template by replicating
    audio input selectors and output configuration for audio_count times
    :param source:          S3 source
    :param destination:     S3 destination where loudness logs will be stored
    :param audio_count:     The number of audio tracks to analyze loadness for
    :param template_settings: The MediaConvert template used to analyze audio
    :return:                MediaConvert job settings
    """
    job_settings = template_settings
    job_settings["Inputs"][0]["FileInput"] = source
    job_settings["OutputGroups"][0]["OutputGroupSettings"] \
        ["FileGroupSettings"]["Destination"] = destination

    input_audio_selector = copy.deepcopy(
        job_settings["Inputs"][0]["AudioSelectors"]["Audio Selector 1"]
    )

    output_audio_description = copy.deepcopy(
        job_settings["OutputGroups"][0]["Outputs"][0]["AudioDescriptions"][0]
    )

    job_settings["Inputs"][0]["AudioSelectors"] = {}
    job_settings["OutputGroups"][0]["Outputs"][0]["AudioDescriptions"] = []

    #for each audio track, create the input selector and the output description
    for ii in range(1,audio_count+1):
        selector_name = "Audio Selector " + str(ii)

        ias = copy.deepcopy(input_audio_selector)
        ias["Tracks"][0] = ii
        job_settings["Inputs"][0]["AudioSelectors"][selector_name] = ias

        oad = copy.deepcopy(output_audio_description)
        oad["AudioSourceName"] = selector_name
        job_settings["OutputGroups"][0]["Outputs"][0] \
            ["AudioDescriptions"].append(oad)


    return job_settings

Save changes.

6.5 Other configurations

Scroll down to the Environment variables section and add the following variables with their corresponding values:
- DESTINATION_BUCKET: name of the destination bucket (step 3)
- DYNAMODB_TABLE_NAME: name of DynamoDB table (step 2)
- MC_LOUDNESS_TEMPLATE: name of MediaConvert job template (step 1)
- MC_ROLE_ARN: ARN of MediaConvert execution role (step 5.1)
Here we can add an optional workflow name tag in Tags section.
In Basic settings section, update the Timeout to 1 min 0 sec.
Save changes.

Step 7: Create the second Lambda function: LambdaLoudnessResults

From the menu, choose Functions and click Create function button.
Choose Author from scratch.
Input Function name.
Select Python 3.7 for Runtime.
For Permissions, select the role created in 5.2, as we did for the first function.
Choose Create Function.
Repeat the steps from 6.2 to add boto3 layer only. The MediaInfo layer is not required for the second function.
Click on function name to show the Function code section and replace the code in the editor with the following code:

import os
import boto3
import logging
import json

SILENT_THRESHOLD = -50 # silent audio has max loudness lower than threshold

logger = logging.getLogger('boto3')
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    logger.info("Cloudwatch event: {}".format(json.dumps(event)))

    #read job details from the event
    assetID = event["detail"]["userMetadata"]["AssetID"]
    sourceS3 = event["detail"]["userMetadata"]["Source"]
    audioCount = int(event["detail"]["userMetadata"]["AudioCount"])
    destinationVideo = event["detail"]["outputGroupDetails"][0] \
        ["outputDetails"][0]["outputFilePaths"][0]
    tableName = event["detail"]["userMetadata"]["DynamoTable"]

    #loudness logs will be stored in same folder as destinationVideo
    #loudness logs name pattern: <videoFileName>_loudness.<trackNumber>.csv
    videoBaseName = os.path.basename(destinationVideo) # ex: video_1.mp4
    ext = videoBaseName.split('.')[-1] # -> mp4
    videoFileName = videoBaseName[:-1-len(ext)] # -> video_1

    # destinationVideo ~= s3://bucket/path/to/file.ext
    tt = destinationVideo.split('/')
    bucket = tt[2]
    s3Path = os.path.dirname(destinationVideo)[destinationVideo.index(tt[3]):]

    # get loudness logs from S3
    s3 = boto3.client('s3')

    # clean up temporary video file from S3
    videoKey = s3Path + '/' + videoBaseName
    s3.delete_object(Bucket=bucket, Key=videoKey)

    maxLoudness = []
    loudnessMap = []

    for ii in range(audioCount):
        suffix = '.' + str(ii+1)
        if audioCount == 1:
            suffix = ''
        s3Key = s3Path + '/' + videoFileName + '_loudness' + suffix + '.csv'
        localFile = '/tmp/loudness' + suffix + '.csv'
        s3.download_file(bucket, s3Key, localFile)

        f = open(localFile)
        lines=f.readlines()
        inputMaxLoudness = float( lines[-1].split(',')[6] )
        logger.info( 'Input Max Loudness reading for track ' + str(ii+1) + \
            ' is: ' + str(inputMaxLoudness) )
        maxLoudness.append(inputMaxLoudness)
        loud = 0
        if inputMaxLoudness > SILENT_THRESHOLD:
            loud = 1
        loudnessMap.append(loud)
        f.close()

        #clean it up!
        s3.delete_object(Bucket=bucket, Key=s3Key)

    logger.info("Input Max Loudness: {}".format(maxLoudness))
    logger.info("Input Loudness Map: {}".format(loudnessMap))

    #Update DynamoDB table with loudness results
    dynamo = boto3.resource("dynamodb")
    dynamoTable = dynamo.Table( tableName )
    dynamoTable.update_item(
        Key = {'assetid': assetID},
        AttributeUpdates={
            'maxloudness' : {
                'Value' : json.dumps(maxLoudness),
                'Action': 'PUT'
            },
            'loudnessmap' : {
                'Value' : json.dumps(loudnessMap),
                'Action': 'PUT'
            }
        }
    )

    return 1

Increase the Timeout to 1 min 0 sec.
Save changes.

Step 8: Configure CloudWatch Events rule to trigger LambdaLoudnessResults

Open CloudWatch service console and choose Rules from the left-side menu (located under Events).
Choose Create rule.
In the Event Source section,
- Choose Event Pattern.
- Click on Build event pattern … and select Custom event pattern.
- Copy the following JSON code into the editor:

{
  "source": [
    "aws.mediaconvert"
  ],
  "detail": {
    "status": [
      "COMPLETE"
    ],
    "userMetadata": {"Workflow": ["SilentDetection"]}
  }
}

In the Targets section,
- Click on Add target.
- Lambda function will be selected as target by default. For Function, select the name of second Lambda function from the drop-down list.

Choose Configure details.
Enter Rule name and click Create rule.

Running the workflow and showing results

To trigger the workflow, I uploaded a video source file that includes 8 mono audio tracks to the S3 ingest bucket. The results are:

The duration of the first Lambda function call was 15,259 ms, as shown in the CloudWatch logs.
MediaConvert took 14s to complete the job (2s in queue and 12s in transcoding). Keep in mind that we processed only the first 60s of the source.
The duration of the second Lambda function call was 3,839 ms.
The results stored in the DynamoDB table are:
- audiocount: 8 (tracks)
- loudnessmap: [1, 1, 0, 0, 0, 0, 0, 0] (showing that first 2 tracks have audio and the remaining tracks are silent)
- maxloudness: [-12.766523, -14.114044, -143.268118, -143.268118, -143.268118, -143.268118, -143.268118, -143.268118] (max loudness values)

The MediaInfo and file source URI were also stored in the table.

Conclusion

In this post, we learned how to create an automated workflow to analyze audio tracks of source media assets and detect silent tracks using MediaConvert and Lambda. The results are stored in DynamoDB, making it simple to integrate with other workflows such as the Video on Demand on AWS solution or the Media Analysis Solution.

If you have comments or suggestions, please submit them in the comments section below.

AWS for M&E Blog