Build a scalable file-based noise reduction solution with Amazon Voice Focus AMI

In November 2021, Amazon Chime SDK team launched Amazon Voice Focus AMI for customers to help reduce background noise and improve the quality of their audio content. Amazon Voice Focus is an award-winning, deep-learning noise suppression algorithm used in Amazon Chime SDK meetings. It is now packaged as an Amazon Linux 2 (AL2) Machine Image (AMI). The Amazon Voice Focus AMI helps builders, content creators, and media producers to reduce background noise such as fans, lawnmowers, and barking dogs as well as foreground noise like typing and shuffling papers.

In this blog, we will show you how to build a scalable application with Amazon Voice Focus AMI to asynchronously apply noise reduction on media files using predefined auto scaling policy.

Overview

This solution will create an Amazon Simple Storage Solution (Amazon S3) bucket with input and output folder. Any file uploaded in input folder will be automatically detected and Amazon Voice Focus noise reduction will be applied. When processing is finished or scaling events occur, notification messages will be sent to an Amazon Simple Notification Service (Amazon SNS) topic.

Note: Deploying and using the solution created in this post can incur AWS charges.

High-level design of scalable file-based noise reduction solution.

Prerequisites

AWS account is subscribed to Amazon Voice Focus AMI (AL2) in Marketplace, which is provided by a private offer.
Node/npm installed.
Python3 installed.
AWS Command Line Interface (AWS CLI) installed.
AWS Credentials configured for the account/region that this solution will be deployed.
Latest version of AWS Cloud Development Kit (AWS CDK) installed.

Walkthrough

Deploy solution

To deploy the stack, run the following bash commands:

git clone https://github.com/aws-samples/amazon-chime-sdk-ml
cd amazon-chime-sdk-ml/Amazon\ Voice\ Focus\ AMI\ \(Amazon\ Linux\ 2\)/VoiceFocusAMIAsyncFileProcessingCDK/
npm install && cdk bootstrap && cdk deploy --require-approval never

To receive notification:

An Amazon SNS topic will be created during deployment. Scaling events and processing status notification will be sent to this topic. In order to receive notification messages, you need to subscribe to this Amazon SNS topic with the SNSARN output from console following Subscribing to an Amazon SNS topic.

Process Media Files

This solution allows you to process media files in audio/wav and video/mp4 formats.

After your stack is deployed, an Amazon S3 bucket will be created and its InputS3URI path will be shown as an AWS CloudFormation output in the AWS console. You can use this file processing infrastructure by simply uploading media files under supported formats to the InputS3URI path. To do this you can either upload manually in the AWS console or use the AWS CLI to upload with the following command (We use the sample audio file here for testing, but feel free to replace it with your own media file).

aws s3 cp audio/example_16khz_1min.wav s3://voice-focus-processing-bucket-<aws_account_number>-<aws_region>/input/

After you upload your media file, the Amazon Voice Focus AMI workers will start noise-reduction processing on the file. When the processing is finished, a file with the same name will be generated in the output folder, and notification messages will be sent to the Amazon SNS topic created.

Then you can download the processed file either from the AWS console or by using the following AWS CLI command from the output path:

aws s3 cp s3://voice-focus-processing-bucket-<aws_account_number>-<aws_region>/output/example_16khz_1min.wav example_16khz_1min_output.wav

Test Fleeting Scaling

In order to test the scalability of the Amazon Voice Focus AMI workers, we provide test_scaling.py script in example’s root folder. This python script automatically picks up resources this solution deployed in your AWS account, uploads an example media file to your Amazon S3 input path, and then sends multiple processing requests to the Amazon Simple Queue Service (Amazon SQS) in a very short time. These requests fed to Amazon SQS will be picked up and get processed by the running instances in the Auto Scaling Group, and then CPUUtilitization will increase accordingly and thereby trigger scaling events.

To test the scalability of this solution, run the following bash commands:

pip3 install boto3
python3 test_scaling.py

To verify scaling activity of Auto Scaling Group, you can either follow Verifying a scaling activity for an Auto Scaling group to view them in console or running the following AWS CLI describe-scaling-activities command.

aws autoscaling describe-scaling-activities --auto-scaling-group-name voice-focus-asg

Scaling activities will also be sent to the Amazon SNS topic created by this solution. You can receive these messages automatically by subscribing to it.

How it works

At the provisioning stage, this solution will build FFmpeg artifact from its official public Github repository using AWS Codebuild, and the output artifact will be uploaded to the Amazon S3 resource folder. This FFmpeg artifact will be used by each Amazon Voice Focus AMI to perform media format transcoding.

The solution is designed so that any file uploads in Amazon S3 input folder will trigger the AWS Lambda function BucketPutHandler, which will send the processing request message to Amazon SQS. This solution will create a fleet of instances running Amazon Voice Focus AMI to asynchronously process request messages from Amazon SQS. Each Amazon Voice Focus worker will run the script worker.py, which spawns as many processes as the CPU count the instance has to ensure the maximum utilization of its computing resources. After each processing request is completed, the worker will send a notification to an Amazon SNS topic and upload the result back to the Amazon S3 output folder. All processing logs will be uploaded to AWS CloudWatch in a one minute interval.

The fleet of Amazon Voice Focus AMI instances runs under an Auto Scaling group, which uses target tracking scaling policy based on the Average CPU utilization, and it will try to keep the utilization to the targeted value (75% of CPU utilization). Once the Average CPUUtilization of the fleet is greater than 75% for 3 data points within 3 minutes, the scaling-out event will be triggered and more instances will be launched in attempt to lower down the CPUUtilization. On the contrary, if Average CPUUtilization is blow 52.5% for 15 data points within 15 minutes, the scaling-in event will be triggered instead and one or more instances in the ASG will be terminated.

This auto scaling policy is just an example, users can edit/customize the AS policy for their own use cases.

Clean up

To clean up this demo, please run the cdk destroy command or delete the stack in AWS CloudFormation.

Conclusion

This solution shows how you can easily build a scalable application with Amazon Voice Focus AMI. While this example is focused on file-based processing, the same primitives can be used to implement other real life applications such as applying Amazon Voice Focus noise reduction on live streaming and real-time video chat.

Please see our GitHub repo for the example code: https://github.com/aws-samples/amazon-chime-sdk-ml

Business Productivity