Background noise suppression on the PSTN using the Amazon Chime SDK

Many people have experienced frustrating background noise on a meeting or phone call. Whether it’s talking to someone who is driving, a customer support agent in a large noisy office, or food being delivered by a courier on a windy night. It can be distracting and unproductive at best. We are excited to announce Amazon Voice Focus for calls to or from the public telephone network.

Amazon Voice Focus is an award winning deep-learning noise suppression application that uses Machine Learning and is designed to reduce unwanted noise from audio in real time without adding perceivable delay to the call participants. Amazon Voice Focus reduces the sound levels of noises that can intrude on a call, such as: wind, fans, running water, lawnmowers, barking dogs, typing and papers shuffling. Instead of detecting the noise, we trained a machine learning model to recognize speech and whatever is left over is the noise we try to suppress.

Developers can now integrate this functionality into their own applications using Amazon Chime SDK Public Switched Telephone Network (PSTN) audio. Amazon Chime SDK PSTN audio makes it easier for developers to build customized telephony applications using the agility and operational simplicity of serverless AWS Lambda functions. You can use the Amazon Chime SDK to build conversational self-service applications to reduce call resolution times and automate informational responses.

In this blog, we will walk through how to implement the Amazon Voice Focus action to build a demo which creates a call between two parties with the ability to turn Amazon Voice Focus on and off to experience the improvement it can have on the clarity of voice throughout a call. The voice application is written as an AWS Lambda function using Python and creates a phone number that when called will answer with a greeting to enter a destination phone number to be connected to. Once the two parties are connected, Amazon Voice Focus noise suppression is enabled by default, but it can disabled and re-enabled throughout the call using your keypad.

Amazon Voice Focus Demo App Architecture

High-level design of the Amazon Chime SDK noise suppression solution with Voice Focus

The AWS elements used in this demo are:

Amazon Chime SDK PSTN audio
Amazon Simple Storage Solution (S3)
AWS Lambda

You can learn more about how to build PSTN-enabled voice applications in the Amazon Chime SDK by consulting our documentation. Please note your AWS account will be charged for usage of these services.

The architecture includes the following components:

ChimeSdkPstnLambda – AWS Lambda function written in Python that is associated with the Amazon Chime SDK PSTN audio and used to handle the call flow logic
Amazon Chime PSTN audio – routes call events to your AWS Lambda
Phone Number – A number provisioned to use with the SIP Media Application rule
wavFiles – Amazon S3 bucket to store wav files for playing customized messages

Prerequisites

node V16+/npm installed
AWS Command Line Interface (AWS CLI) installed
AWS Credentials configured for the account/region that will be used for this demo
Permissions to create Amazon Chime SIP Media Applications and Phone Numbers (ensure your Service Quota in us-east-1 or us-west-2 for Phone Numbers, Voice Connectors, SIP Media Applications & SIP Media Application Rules have not been reached)

Walkthrough

Once you have installed the dependencies, clone the repo and run the deploy script:

git clone https://github.com/aws-samples/amazon-chime-sdk-pstn-audio-voice-focus.git
cd amazon-chime-sdk-pstn-voice-focus
./deploy.sh (accept prompts for CDK deployment)

After successfully deploying the CDK components, take note of the ChimeSdkPstnVoiceFocusStack.inboundPhoneNumber in the output of the deployment script. This is the phone number provisioned as described in the Resources Created section; it will be used as the inbound phone number in our demo.

Now it’s as simple as making a test call to the phone number and you should be greeted with a “please enter the number you wish to call” message at which point you will enter an 11 digit US number. You will then be connected to that number and once answered, both participants can now converse with Amazon Voice Focus enabled by default for both call participants. Throughout the call either participant can press 0 on their keypad and it will disable Amazon Voice Focus for their audio stream sent to the other participant and background noise will no long be suppressed – and similarly it can be re-enabled by pressing 1 on the keypad.

These benefits are most noticeable when at least one of the call participants is in a noisy environment, but noise can be simulated by tapping or scratching the microphone on your device. This is really noticeable if you have two phones and can speak and hear both ends of the call yourself – tap your microphone on one phone, you won’t hear it on the other phones’ speaker. Press 0, and you will now suddenly hear the noise that was being reduced. If you’re like me you have multiple phones, and I’m based in Ireland – so when I call a US number there is the typically expected background hiss from placing a long distance international call. When Amazon Voice Focus is enabled, I can no longer hear the hiss and the audio is much clearer.

How it Works

The ChimeSdkPstnLambda is the main call routing logic in this demo, and has a simple flow:

For a new call, play greeting and collect the entered destination number
Bridge the caller to the destination number
Enable Amazon Voice Focus for both call participants
Enable a listener to support either participant toggling Amazon Voice Focus off or on again

When a call is placed to the phone number, a SIP Media Application rule will route the call to the SIP Media Application. This will invoke your ChimeSdkPstnLambda function with a NEW_INBOUND_CALL event which calls the new_call_handler function. In turn this will execute the play_and_get_digits_action function which responds to the AWS Lambda invocation with the PlayAudioAndGetDigits action, with parameters set for the wav file to play, the failure wav file to be played for an incorrect entry, the regex pattern to match against what the user enters, and the Call-ID of the inbound caller to take the action on.

If a conforming 11 digit number is entered by the caller, your ChimeSdkPstnLambda function will now be triggered with an ACTION_SUCCESS event and will contain the destination number the caller entered. We can now use that information to respond with a CallAndBridge action. When that succeeds, we now have the call established and can turn Amazon Voice Focus on for both call legs using the enable_voicefocus function – we will do this by creating a list of actions to return:

def enable_voicefocus(event):
    actions = []
    for call in event['CallDetails']['Participants']:
        actions.append(voicefocus_action(call['CallId'], 'True'))
    return response(actions[0], actions[1])

The VoiceFocus action just requires 2 parameters to be set, whether enabled is True or False, and the Call-ID of the participant – where both Call-IDs are available in the invocation event we received after we have bridged the two participants together:

def voicefocus_action(call_id, enabled):
    return {
            'Type': 'VoiceFocus',
            "Parameters": {
                'Enable': enabled,
                'CallId': call_id,
            }
        }

If Amazon Voice Focus has been enabled successfully we will get another ACTION_SUCCESSFUL event in response to the action. Now that Amazon Voice Focus is enabled for both participants we want to give the participants the ability to disable and re-enable background noise suppression using DTMF via their keypads. We do this by sending a ReceiveDigits action to collect and notify us when a participant presses a key. We then set the status of Amazon Voice Focus on the given participants’ audio stream accordingly:

def control_voicefocus(call_id, event):
    if event['ActionData']['ReceivedDigits'] == '0':
        enabled = False
    elif: event['ActionData']['ReceivedDigits'] == '1':
        enabled = True
    return response(voicefocus_action(call_id, enabled))

We finish up the call flow when one participant hangs up, and we hang up the second participant using the Hangup action.

Sequence Diagram

Cleanup

To clean up this demo, execute: cdk destroy

Conclusion

This demo has showcased how the Amazon Chime SDK PSTN audio VoiceFocus action can be used for phone call background noise suppression for improved speech clarity, amongst other actions such as CallAndBridge, PlayAudioAndGetDigits, PlayAudio, ReceiveDigits and Hangup in a simple Python Lambda function. You can leverage the technology and the development patterns showcased in this demo to implement solutions that enable remote working, mobile workers with multiple contact phone numbers, Interactive Voice Response (IVR) menus and augment cloud-based or on-premise call centers. VoiceFocus is just one of a number of supported PSTN actions currently available.

Refer to our github repo for the project code: https://github.com/aws-samples/amazon-chime-sdk-pstn-audio-voice-focus

Additional reading:

To read more about the Amazon Chime SDK PSTN audio refer to https://docs.aws.amazon.com/chime/latest/dg/build-lambdas-for-sip-sdk.html

Business Productivity