Transcoding video files with S3 Batch Operations

The ability to work quickly in bulk—to work smarter rather than harder—is vital for the practical storage management of modern video archives, which can easily be petabytes of data. S3 Batch Operations is an Amazon S3 feature that can perform actions like copying or tagging objects across millions or billions of objects with a single request or a few clicks in the S3 console. All you provide is the list of objects and S3 Batch Operations handles the rest of the work, including managing retries and displaying progress.

Many media companies store their extensive video repositories in S3. Typical practice suggests storing current videos in S3 Standard and archiving older videos into S3 Glacier. Lambda invocations could be triggered on S3 events such as putting a new object into a bucket. S3 Batch Operations provides you with a managed solution to assist with triggering Lambda functions for existing objects and performing other large-scale tasks in S3.

In this post, I review how to use S3 Batch Operations to trigger a video transcoding job using AWS Lambda, either from video stored in S3 or video requiring a restore from Amazon S3 Glacier.

Create a Lambda transcoding workflow

A typical transcoding workflow takes an existing video file and converts it into multiple file types for video-on-demand playout. This example uses the VOD Automation workflow post to create the transcoding workflow for my files. We’ll make modifications to this workflow to support video objects already stored in S3 as well as video objects archived in S3 Glacier that need to be restored to transcode for playout.

First, set up the VOD automation workflow using the post linked above. When you reach step 2, creating the Lambda function, return to this post. You must change the Lambda function so it works with S3 Batch Operations. This modification in the Lambda function uses metadata from a new field in the event JSON passed to Lambda and sends a job response back to S3 Batch Operations. For more, see Invoking a Lambda Function from Amazon S3 Batch Operations.

The VOD Automation post uses an S3 trigger in Lambda to point to the bucket ingesting video files. In the Lambda function you create for S3 Batch Operations, do not setup an S3 trigger as your S3 Batch Operations job will trigger the Lambda function directly.

S3 trigger in Lambda for VOD Automation post

In the convert.py file, add the following lines of code at the beginning to extract the S3 Batch Operations event ID and specific task information, such as S3Bucket and S3Key.

First, the S3 Batch Operations job will send job parameters to Lambda in the event JSON, which includes jobId, invocationId, and invocationSchemaVersion. Lambda grabs these parameters by adding the following three lines:

jobId = event[‘job’][‘id’]
invocationId = event[‘invocationId’]
invocationSchemaVersion = event[‘invocationSchemaVersion’]

With the specific S3 bucket and key information now in the event JSON task dictionary, a few more lines extract this information directly in the event records. Add the following lines:

task = event[‘tasks’][0]
taskId = task[‘taskId’]
sourceS3BucketArn = task[‘S3BucketArn’]
sourceS3Key = task[‘S3Key’]

Code for Lamda to grab correct parameters from S3 batch operations

Each invocation of Lambda from S3 Batch Operations also needs a results dictionary to include taskId, resultCode, and resultString. I set resultCode and resultString depending on the response from the AWS Elemental MediaConvert jobs. For example, if there is an exception when submitting the MediaConvert jobs, then resultCode = 500 and resultString = e for the exception string. Then at the end of the MediaConvert job, I’ve added the following code block to add results to the results array:

finally:
   results.append({
      ‘taskId’: taskId,
      ‘resultCode’: resultCode,
      ‘resultString’: resultString,
   })

When triggering Lambda with S3 Batch Operations, a new response needs to be created that returns special data fields. This tells S3 Batch Operations if the Lambda function successfully executed each task. In this specific case, Lambda returns whether or not the MediaConvert job was successfully submitted:

return {
   ‘invocationSchemaVersion’: invocationSchemaVersion,
   ‘treatMissingKeyAs’: ‘Permenant Failure’, 
   ‘invocationId’: invocationId,
   ‘results’: results
}

When triggering Lambda with S3 Batch Operations, a new response needs to be created that returns special data fields.

Create an S3 Batch Operations job using the console

Now that the transcode workflow is complete, set up your S3 Batch Operations job. A job refers collectively to the list—called a manifest—of objects provided, the operation performed, and the parameters specified for the S3 Batch Operations job.

To transcode a set of video objects, use the Invoke AWS Lambda function operation and specify the Lambda function that you created to invoke a MediaConvert job.

Let’s get started creating our job to transcode these objects:

In the S3 console and choose Buckets, Batch Operations on the left tab under Buckets.
Choose Create Job.
Choose the appropriate Region for your S3 bucket. Under Choose manifest, select where your objects are stored and the CSV file and enter the path to your S3 bucket. If your manifest contains version IDs, make sure to check that box, and choose Next.
Select Invoke AWS Lambda function, and select the Lambda function you created—in this case VODLambdaConvertBatch. Select the function version if you need a different version from $LATEST.
Choose Next.
In Step 3 of the setup wizard, give your job a description and priority level, and choose a report type and destination.
Choose an IAM role for your S3 Batch Operations job to assume. To learn more about the necessary role and invoking Lambda functions in S3, see Invoking a Lambda Function from Amazon S3 Batch Operations.
- Select a role with permissions to s3:GetObject and s3:GetObjectVersion for the object source buckets as well as the bucket that holds the manifest file.
- The role also needs s3:PutObject for the destination bucket for the job completion report.
- Lastly, the role needs the lambda:InvokeFunction permission for the Lambda function that it invokes. In this case, the role name is S3BatchLambdaRole.
Choose Next.
In Step 4, review and verify your job parameters before choosing Create Job.
After S3 finishes reading your job’s manifest, it moves the job to the Awaiting your confirmation From here, you can check the number of objects in the manifest and choose Confirm the job to run it.

After the job starts running, you can check its object-level progress through the console dashboard view or by selecting the specific job. As each Lambda invocation occurs, S3 writes logs to CloudWatch Logs.

In the Lambda console, select your Lambda function, and choose Monitoring. You can see Lambda Invoke metrics along with an easy navigation button to View logs in CloudWatch.
When the S3 Batch Operations job completes, view the Successful and Failed object counts to confirm that everything performed as expected. For the details on failed objects, see your job report.
To monitor video transcode jobs, navigate to the MediaConvert service to monitor transcoding jobs for each of the files.

Changes for S3 Glacier Restore

Often, companies store older video objects in S3 Glacier, but must run the same transcode jobs to prepare the video for playout. The workflow is similar to the earlier steps, with a few small changes.

When creating your Lambda function using the VOD Automation workflow, configure an S3 trigger on the S3 bucket that receives your restored videos. For the event type, choose Restore from Glacier Completed. Create a separate Lambda function for triggering with S3, as the function that you created earlier needed specific additions for S3 Batch Operations reporting.
Create the S3 Batch Operations job as before, with the following changes:
1. In step 2, instead of selecting Invoke AWS Lambda function, select Restore.
2. Under Restore options, select the number of days that the restored objects should remain available. The original objects remain in S3 Glacier. After the selected number of days, S3 removes the restored object, leaving only the S3 Glacier copy. For a transcoding job, you only have to restore objects long enough to complete the transcode job, at which point you or S3 can remove the restored object.
3. Select the retrieval time required, either Bulk or Standard.Choose Next.
4. In step 3, use the same options described before, except the IAM role. In the case of an S3 Glacier restore job, the IAM role also needs s3:RestoreObjectpermissions for the bucket containing the S3 Glacier object.
Complete the job creation and confirmation steps as before.
This new function activates when a client submits an S3 Glacier restore job. Based on the retrieval time selected, S3 restores the objects from S3 Glacier. Each completed object restore function then triggers the VOD automation workflow and completes the transcode jobs in MediaConvert.

Things to note

Keep a few critical pieces of information in mind as you explore the capabilities of S3 Batch Operations.

Lambda concurrent invocations:

When you use S3 Batch Operations with a Lambda function, each object causes a separate Lambda invocation. If your S3 Batch Operations job is large, it could invoke multiple Lambda functions at the same time, causing a spike in Lambda concurrency.

Each AWS account has a Lambda concurrency limit per Region, so you should review the AWS Lambda Function Scaling. A best practice for using Lambda functions with S3 Batch Operations is to set a concurrency limit on the Lambda function itself. This keeps your batch job from consuming most of your Lambda concurrency and potentially throttling other functions in your account.

Lambda event triggers vs. invoke Lambda from S3 Batch Operations:

When creating a Lambda function for use with S3 Batch Operations, you must create a new Lambda function as S3 Batch Operations sends specific task data to Lambda and expects result data back.

This is different from a typical Lambda function using event triggers, which don’t require returned result data. To learn more about other unusual exceptions to Lambda rules, see Invoking a Lambda Function from Amazon S3 Batch Operations.

S3 Event Notifications:

When using S3 event notifications, delivery typically occurs within seconds, but can sometimes take minutes. On rare occasions, events might be lost.

To verify that your app restored and processed all objects have been restored from S3 Glacier and processed using S3 Batch Operations, compare the number of items in the inventory with the number of results from the S3 Operations job. For more information, see Configuring Amazon S3 Event Notifications.

S3 Batch Operations failure threshold:

If greater than 50% of a job’s object operations are failing after more than 1000 operations have been attempted, the job automatically fails. Check your final report to identify the cause of the failures.

S3 Batch Operations Cost:

S3 Batch Operations are charged $0.25 per job and $1 per million object operations performed. This is in addition to any charges associated with the operation that S3 Batch Operations performs on your behalf, including data transfer, requests, and other charges. To better understand the costs associated with the S3 Batch Operations Job, be sure to test the workflow against a small number of objects to confirm cost expectations prior to scaling the job up to a large inventory of objects.

Conclusion

This post reviewed the process of setting up S3 Batch Operations to trigger a repeated Lambda function to transcode video across hundreds, thousands, or millions of files in an archive. It also looked at how you can use this same process, with a few modifications, to pull older, archived files from S3 Glacier for transcoding.

Now Available

S3 Batch Operations is available in all commercial AWS Regions except Asia Pacific (Osaka). S3 Batch Operations is also available in both of the AWS GovCloud (US) Regions.

Hopefully, you have found this post informative and the proposed solution intriguing. I welcome your feedback and comments.

AWS Storage Blog