Building a PSTN call answering system with the Amazon Chime SDK, Amazon Lex and Amazon Polly

The Amazon Chime SDK Public Switched Telephone Network (PSTN) Audio service makes it easy for developers to build customized telephony applications using the agility and operational simplicity of a serverless AWS Lambda functions. You can use the PSTN Audio service to build conversational self-service applications to reduce call resolution times and automate informational responses.

In this blog, we will teach you how to build a conversational interactive voice response (IVR) system for a fictitious flower store that accepts orders over the phone. The voice application we build supports automatic speech recognition (ASR) and natural language understanding (NLU) using Amazon Lex, the same proven technology that powers Alexa. This example voice application is implemented as AWS Lambda functions written in JavaScript.

Overview

High-level design of the Amazon Chime SDK – Amazon Lex-based Answering System

The AWS elements used in this demo are:

Amazon Chime SDK PSTN Audio service
AWS Lambda
Amazon Simple Cloud Storage (S3)
Amazon Lex
Amazon Polly

You can learn more about how to build PSTN-enabled voice applications in the Amazon Chime SDK by consulting our documentation.

Resources Created

appLambda – A Lambda function written in JavaScript that processes events sent from the PSTN audio service. It also sends text strings to Amazon Polly for creating voice prompts, and sends audio data to the LexLambda function
lexLambda – A Lambda function written in JavaScript that interacts with Amazon Lex to get the next voice prompt to play for the caller
S3 bucket – An S3 bucket to store all the audio files interchanged throughout this demo
SIP media application – A managed object that specifies an AWS Lambda function to invoke
SIP Rule – A managed object that specifies a phone number to trigger on and which SIP media application managed object to use to invoke an AWS Lambda function
Phone Number – An Amazon Chime SDK PSTN phone number provisioned for receiving phone calls

Pre-requisites:

node V12+/npm installed
AWS CLI installed
Node Version Manager (nvm) installed
The following node modules installed: typescript aws-sdk aws-cdk (using nvm)
AWS Credentials configured for the account/region that will be used for this demo
Permissions to create Amazon Chime SIP Media Applications and Phone Numbers (ensure your Service Quota in us-east-1 or us-west-2 for Phone Numbers, Voice Connectors, SIP media applications & SIP rules have not been reached)
Deployment must be done in us-east-1 or us-west-2 to align with PSTN audio resources
Configure the Amazon Lex “Order Flowers” bot (see below)

Configuration of Amazon Lex (V1)

This demo uses the Amazon Lex “Order Flowers” sample bot. To set this bot up do the following. In your browser, navigate to the Amazon Lex console. For example, if you are in the us-west-2 region, your console would be here. This demo uses Amazon Lex v1 so please be sure to use the Amazon Lex v1 console.

Select the “Order Flowers” demo.
Accept the default bot name of “OrderFlowers”
Choose what language you want the bot to understand.
Select if you would like sentiment analysis to be provided by Amazon Lex.
Select the appropriate response for COPPA.
Amazon Lex will use the recorded audio provided to improve its service. You can disable this by selecting “No” for Advanced Options. For more information, read “AI services opt-out policies.“
Accept the default Confidence Score Threshold.
When you have entered all the values, click the Create button.
You will be taken to a page to configure the bot. For this demo we will use all the default values. The bot will take a few moments to build.
When the bot is ready, click the Publish button.
You will be prompted to enter an Alias. Enter “PROD” (in all caps).
Click Publish (again) and your bot will be available in a few moments.

Demo Code Deployment Instructions

In a terminal, perform the following steps:

git clone https://github.com/aws-samples/amazon-chime-sdk-lex-pstn-demo.git
cd amazon-chime-sdk-lex-pstn-demo
./deploy.sh

Terminal Output

This application implements full deployment automation using the AWS CDK. When you deploy it, all needed resources will be created in your AWS account. When the script completes you will get something like this in your terminal:

✅  ChimeSdkPstnCdkLexDemo

Outputs:
ChimeSdkPstnCdkLexDemo.chimeProviderLog = /aws/lambda/ChimeSdkPstnCdkLexDemo-chimeSdkPstnProviderLambaEA-V8PYYKxUA2Z1
ChimeSdkPstnCdkLexDemo.chimeSdkPstnInfoTable = ChimeSdkPstnCdkLexDemo-callInfo84B39180-KMIWJRX121XK
ChimeSdkPstnCdkLexDemo.inboundPhoneNumber = ***** PHONE NUMBER HERE *****
ChimeSdkPstnCdkLexDemo.lambdaARN = arn:aws:lambda:us-west-2:<account number>:function:ChimeSdkPstnCdkLexDemo-ChimeSdkPstnLambda94B9E76E-8vv9dzwffup3
ChimeSdkPstnCdkLexDemo.lambdaLayerArn = arn:aws:lambda:us-west-2:<account number>:layer:appLambdaLayer43BBEA22:56
ChimeSdkPstnCdkLexDemo.lambdaLexLog = /aws/lambda/ChimeSdkPstnCdkLexDemo-ChimeSdkLexLambda18EF42AF-y4mC76QEMJj5
ChimeSdkPstnCdkLexDemo.lambdaLog = /aws/lambda/ChimeSdkPstnCdkLexDemo-ChimeSdkPstnLambda94B9E76E-8vv9dzwffup3
ChimeSdkPstnCdkLexDemo.phoneID = <PHONE ID>
ChimeSdkPstnCdkLexDemo.region = us-west-2
ChimeSdkPstnCdkLexDemo.sipRuleID = c55e5922-25bf-42d5-a3f8-e65bd314cc34
ChimeSdkPstnCdkLexDemo.sipRuleName = ChimeSdkPstnCdkLexDemo
ChimeSdkPstnCdkLexDemo.smaID = bcf784f1-c902-4b76-a1e5-7b3ae664483e
ChimeSdkPstnCdkLexDemo.wavFilesBucketName = chimesdkpstncdklexdemo-wavfiles98e3397d-ji6r5dxk3wb8

Stack ARN:arn:aws:cloudformation:us-west-2:<account number>:stack/ChimeSdkPstnCdkLexDemo/f8298a50-48c2-11ec-84f8-02b5c6242747

The phone number is on the line shown with ***** PHONE NUMBER HERE *****.
To interact with the system, call that number and the app will respond with voice prompts.

Walkthrough

This application uses the Amazon Chime SDK PSTN Audio service with a single phone number. Call processing is implemented in two Lambda functions written in JavaScript. Amazon Polly generates voice prompts and Amazon Lex interprets caller requests. Recordings of caller requests and system audio responses are stored on an Amazon S3 bucket in the account.

The basic sequence of API operations is shown below:

Detailed Call Sequence Diagram

1. A caller dials the application phone number causing the PSTN audio service to invoke appLambda with a NEW_INBOUND_CALL event.
2. appLambda creates a welcome message string and calls the Amazon Polly API to generate a voice prompt. That voice prompt recording is stored in an S3 bucket created in your account for this demo.
3. appLambda then replies to the PSTN audio service with instructions to do two things: play the voice prompt audio file, and record the caller’s response.
4. The PSTN Audio service performs those actions and then sends ACTION_SUCCESSFUL to appLambda when the recording is complete. In this demo the “detection threshold” period of silence at the end of the caller speaking is set to 2 seconds and the level of background noise that is “silence” is set to 200dB. The maximum period of the recording will be 15 seconds. These values can be changed by editing their values in the code file src/index.js and the range of allowed values are described in the documentation.
5. The appLambda passes the event to the lexLambda for processing.
6. To let the caller know that action is happening, appLambda sends an action to the PSTN Audio service to play an audio file to “mask” the processing latency.
7. The lexLambda reads the recording from S3, processes it into a suitable format and then sends that recording to Amazon Lex. Amazon Lex responds with a text string of the reply phrase and the lexLambda makes a call to Amazon Polly to encode it into a voice recording, which is stored back in the S3 bucket.
8. The lexLambda then calls UpdateSipMediaApplication with the details on the call that needs updating and the location of the new audio file to use for a follow-on prompt. If Amazon Lex determined that the Bot has collected all needed data then it also sets an ‘endFlag.’
9. The PSTN Audio service then sends a CALL_UPDATE_REQUESTED event to appLambda with the details passed to it by the UpdateSipMediaApplication call.
10. appLambda then repeats the same sequence, collecting data for each slot in the Amazon Lex Bot. If the ‘endFlag’ is set then the appLambda plays a final goodbye recording and hangs up.

Please be aware that for the sake of brevity one call to the appLambda was omitted. When the masking audio playback is interrupted as a result of the the PSTN audio service receiving the UpdateSipMediaApplication API call, the service will invoke appLambda with an ACTION_INTERRUPTED event. In this demo the appLambda just ignores that call. More sophisticated applications, however, could make use of that information to perhaps take a different action or generate necessary logs.

Cleanup

To clean up this demo and avoid incurring further charges do the following:

1. In the terminal and folder where you created the demo type

make destroy

The CloudFormation stack created by the CDK will be destroyed, removing all the allocated resources.

Conclusion

In the blog, you learned how to build a conversational interactive voice response (IVR) system using the Amazon Chime SDK PSTN Audio service and Amazon Lex and Amazon Polly. You can use these techniques to build your own system to reduce your own customer call resolution times and automate informational responses on your customers calls.

For more information, see:

Project GitHub repository
Chime SDK PSTN Audio documentation

Business Productivity