Leverage AWS Serverless Application Model to run MediaInfo at scale

Media & Entertainment (M&E) companies often have a large number of media files stored in Amazon Simple Storage Service (Amazon S3). However, before using these files in media processing applications, they often need technical metadata about video codecs, frame rates, audio channels, duration, and more. MediaInfo, a unified display of the most relevant video and audio file data, is a popular tool for doing just that; but it does not have native support for batch processing at scale.

This blog post describes how to build an AWS Serverless Application Model (AWS SAM) application to run MediaInfo on Amazon S3 objects at scale. The application uses a loosely coupled architecture composed of AWS Lambda, Amazon Simple Queue Service (AWS SQS), and Amazon S3. Using AWS SAM allows development and execution of our function both locally and in Amazon Web Services (AWS). The AWS SAM template includes all necessary resources and provides least-privileged permissions as needed.

Overview of solution

Target technology stack

We use the following AWS services in this solution:

Amazon Simple Storage Service – Amazon S3 is an object storage service. You can use Amazon S3 to store and retrieve any amount of data at any time, from anywhere on the web.
Amazon Simple Queue Service – Amazon SQS offers a secure, durable, and available hosted queue that lets you integrate and decouple distributed software systems and components.
AWS Lambda – AWS Lambda is a compute service that lets you run code without provisioning or managing servers.
AWS Identity and Access Management (IAM) – AWS IAM is a web service that helps you securely control access to AWS resources.

Target Architecture

The following diagram illustrates the resources that are created with the AWS SAM template.

AWS Sam Template Diagram

The diagram also shows the flow of events when the producer AWS Lambda function is invoked.

Automation and scale

AWS SAM provides the automation we need for easily building and deploying our resources (Lambda, IAM, and Amazon SQS). The solution scales automatically by using the serverless architecture of Lambda combined with Amazon SQS. The default configuration for this solution is 50 ReservedConcurrentExecutions on the BatchMediaInfoConsumer Lambda function and a BatchSize of 10 on the SQSMediaInfoEvent Amazon SQS queue. This configuration will run MediaInfo on 10 objects per invocation with up to 50 concurrent invocations. The configuration of either or both of these settings may be changed based on your requirements.

Prerequisites

For this walkthrough, you should have the following:

An AWS account to deploy the resources into
The AWS Command Line Interface (AWS CLI)
The AWS Serverless Application Model (AWS SAM)
Install Docker on your local machine
Either Python 3.8 or Python 3.9 installed on your system

After you have installed all of the prerequisites, make sure you clone the repo to your local machine.

Walkthrough

Building the MediaInfo Lambda layer

The first step we take is building the Lambda layer for MediaInfo using Docker and the Dockerfile in the root of the directory you cloned.

Run the following commands on your local machine

1. docker build --tag pymediainfo-layer-factory:latest . --no-cache 
2. docker run --rm -it -v $(pwd):/data pymediainfo-layer-factory cp /packages/pymediainfo-layer.zip /data

You should now have a file in the same directory called pymediainfo-layer.zip

Modifying mediainfo-producer.py

The file in the src directory called “mediainfo-producer.py” currently only checks for files with the following extension types: “.mp4″, “.mxf”, “.mov”, “.wav”, “.stl”, and “.scc”. If you need to run MediaInfo on a different file extension, you can add any file extension that MediaInfo supports to the following line:

    analyze_file_extensions = [".mp4", ".mxf", ".mov", ".wav", ".stl", ".scc"]

Building and deploying the AWS SAM stack

After we have built our MediaInfo Lambda layer, we need to build and deploy our AWS SAM stack to our AWS account.

Run the following commands on your local machine

1.	sam build
2.	sam deploy --guided

Running sam deploywith the --guided flag will prompt you for a few input parameters that AWS SAM needs.

Stack Name: enter the name you would like your AWS CloudFormation stack to have.
AWS Region: enter the region in which you would like your CloudFormation stack to be deployed to.
Parameter Stage: enter the production environment you will be deploying to. This parameter is appended to the name of resources that are created.
The next four parameters are the S3 bucket and prefix of your content and output buckets.
Confirm changes before deploy: This will prompt you to approve or deny any changes that AWS SAM is going to make to your stack before deploying those changes.
Allow SAM CLI IAM role creation: We will need to explicitly give AWS SAM permission here to create the required IAM roles for our two Lambda functions.
Disable rollback: Either enable rollback on CloudFormation Create/Update failure or disable rollback.
Save arguments to configuration file: If you would like to save these args to a config file, so we can run sam deploy in the future instead of sam deploy --guided enter Y here.
SAM configuration file: the name of the file where the arguments will be saved. It is samconfig.toml by default.
SAM configuration environment: If you do not want to use the AWS credentials stored as default in your AWS credentials profile, enter the name of the profile you would like to use here.

We know that the stack has been successfully deployed and the application is ready to be run when the AWS SAM cli returns a successful result after running sam deploy --guided which should look similar to the following screenshot:

Screenshot of batch aws sam MediaInfo

Invoking our producer Lambda function

Finally, you can invoke our producer Lambda function with the following command:

aws lambda invoke --region [REPLACE REGION] --function-name [REPLACE FUNCTION NAME] --payload {} response.json

When running the previous command, make sure to replace the region and function name with the corresponding values from your AWS account.

Cleaning up

To avoid incurring future charges, delete the resources. To delete the resources that we created with AWS SAM, we can simply run sam delete to delete the stack.

Conclusion

In this blog post, we explained how to run MediaInfo on objects stored in an Amazon S3 bucket at scale. We accomplished this by using Docker and AWS SAM to efficiently build both the MediaInfo layer, used by the AWS Lambda function, and the resources needed to run MediaInfo at scale. MediaInfo is a valuable tool for anyone working in media and entertainment and we welcome feedback on how it can be used even more effectively.

AWS for M&E Blog