Business Productivity

Summarization of call recordings using a Large Language Model with Amazon Chime SDK Call Analytics

The Amazon Chime SDK is a set of composable APIs that enable builders to add communications capabilities to their applications. The call analytics feature of Amazon Chime SDK helps enterprises to record, transcribe and analyze customer conversations. Using the call analytics feature, it is easy to record calls, automatically transcribe conversations and extract insights such as real-time tonal sentiment and speaker identity. Customers can choose to enable either recording or transcription or both for their voice calls, and the recordings and transcripts are made available in the customer’s data lake. In this blog post, we show you how to extract conversation summaries from a call recording or call transcript.

It is valuable to have a brief summary that is automatically generated at the end of each call, as this saves time that meeting participants or contact center agents would have otherwise spent writing up their notes from the call. It can also serve as a record that can be referenced afterward for training or recall. In this post, we show how builders can use the output from Amazon Chime SDK call analytics to automatically generate a brief call summary using a Large Language Model (LLM).

We will discuss the implementation of summarization in two ways. First, we discuss our Amazon SageMaker Notebook which introduces users to the LLM inference process and experimentation with prompt engineering. Then, we discuss what an actual deployment would look like via Amazon Chime SDK Voice Connector.

Preparing the Large Language Model and Prompt Engineering

As we will use an LLM to extract summaries, we first need to prepare the LLM and try out some examples. We have provided a notebook that walks the user through through the basic steps of preparing the LLM, loading a transcript, preparing the prompt for the LLM, submitting the prompt to the model, and saving the output as a summary.

Several LLM options are available to users. We used Cohere in the notebook because it is available as a SageMaker Model Package. To gain access to the LLM used in this blog, please Subscribe to Foundation Models; in the AWS console, go to SageMaker and choose Jumpstart -> Foundation Models on the left toolbar. You will either have access and see the options, or you will need to Request Access and wait 24 hours. Once you have access to Foundation Models, subscribe to cohere-gpt-medium; this provides access to the LLM used here.

The SageMaker Notebook allows users to test access to the LLM and engineer prompts specific to their problem. Start a Notebook Instance of size ml.t3.medium, and navigate to Jupyter Lab on your notebook instance once available. Clone the repository using the Git dropdown menu in JupyterLab.

This will clone the entire repository, so navigate to the notebook using the JupyterLab filesystem. After clicking on the notebook, choose the conda_python3 kernel to launch the notebook.

Included in the repository with the notebook demo is the transcript for a sample call about a customer whose car has been damaged.

We can try different prompts to make the LLM extract the information about why the user is calling; i.e., we can do prompt engineering. For the summarization use case and the Cohere model, we got the best results in our tests with prompts consisting of (1) the call transcript in dialogue format and (2) the specific question we wanted answered. A simple example is provided here:

Prompt

Speaker 0: Thank you for calling Apple Impound. This is Katherine, how can I help you?
Speaker 1: Hi Katherine, my name is Jack. Um I picked up my car from your pound shop recently. It it got taken in one bad day and uh and when I got it back out of the uh the lot I was driving it back and the service light came on I was calling to see if you happen to know anything about that.
Speaker 0: Ok, I’m so sorry to hear that Jack. Um do you happen to have your inbound case number handy so I can look it up.
Speaker 1: Yeah give me a second uh I gotta dig through my email real quick to pull that up.
Speaker 0: No problem.
Speaker 1: Uh ok, found it.
Speaker 0: Mhm.
Speaker 1: The in the sorry the impound case number is 777-7777.
Speaker 0: OK, so just reading them back. So it is 777-7777.
Speaker 1: Yes.
Speaker 0: OK, Great. Uh it was it the 2015 Camry?
Speaker 1: Yes that’s the one.
Speaker 0: Ok. Uh give me one second let me look into its intake uh how it went in the intake test? Ok, so it says on intake uh, we have some scratches on the passenger side door but other than that, everything looks fine. All the light, uh all the electrical and the uh, engine lights checked out just fine. Um So, ok, so it does look like it came to us in just fine working condition other than the scratches. Um So, ok, so here’s what we’re gonna do. Uh I know no one likes having their cot impounded. Everyone thinks with the bad guys, but we do our best here at Apple to uh make this as positive and experience to people as necessary. So we actually are partnered with Apple Body Shop and so you can go take them, take your car over there and they’ll check it out for free, just give them your impound case number and they’ll check it out for free and then let us know if uh in their assessment, uh there’s something that went wrong and then we’ll cover the cost of the fix if it sounds like it went wrong when it was impounded with us. How does that sound to you dad?
Speaker 1: Um That sounds
Speaker 0: ok, wonderful. Uh Would you like the contact information for Apple Body Shop or did you just wanna Google it yourself?
Speaker 1: Uh If I could get that from you, that would be
Speaker 0: ok, great. Uh Are you ready to take the phone number?
Speaker 1: I am.
Speaker 0: OK. So you can reach them at 888-888-8888 and just explain that your car was from Apple impound. Uh give them your case number and then they should give you the next steps from there.
Speaker 1: Ok. Can do thank you so much.
Speaker 0: Yeah, my pleasure. Is there anything else I can help you with?
Speaker 1: I think that’s it for now.
Speaker 0: Ok. Well, thanks for giving us a call and I hope the rest of your day goes well.
Speaker 1: Oh, thank you so much. You have a good one too. Ok.
Speaker 0: Ok. Not a problem. Bye bye
Speaker 1: bye.

What is the customer calling about and what are the next steps?

This prompt is sent to the Large Language Model with two hyperparameters, temperature and max_tokens. The temperature parameter controls the randomness of the response, with 0 being not random and 1.0 being maximum randomness. The hyperparameter max_tokens controls the maximum output size from the LLM, though users still have some control over the actual output size by changing the question asked (e.g. “In one sentence, what is the customer’s problem?”). Tokens are representations of language components, with each LLM having slightly different definitions; a single token can represent a word, word part, character, or punctuation. The full prompt and response must be less than the model’s token limit, which is 2048 for the LLM we are using.

LLM Output

The output generated with the above prompt, max_tokens set to 200 and temperature set to 0 is given here:

The customer is calling about their 2015 Camry that they picked up from Apple Impound. The next steps are to take the car to Apple Body Shop and have them check it out for free, just give them the impound case number.

Summaries are not the only use case for this method. Other questions about the call can also be answered by the LLM. For example, we can replace the prompt with a question more specific to the automotive use case.

Prompt

… (same transcript as above)

What is the customer’s name and what is the make and model of their car?

LLM Output

The customer’s name is Jack and the make and model of their car is a 2015 Camry.

As users refine their questions against call examples, two additional details can help engineer the most useful prompts.

First, the Cohere LLM was only able to answer multiple questions about the same call if the transcript was resubmitted; using more complex LLMs can allow multiple questions per transcript submission. Second, if the call is longer than approximately 10 minutes, the call must be partitioned into multiple submissions to the LLM, with each submission returning its own summary; a final submission to the LLM combines the summaries. Once the full call has been summarized, relevant metadata is attached and the result is saved; the CDK covered in the next section covers posting the result in the data lake for downstream use.

Obtaining the call transcript

There are two approaches to obtaining the call transcript, depending on your call analytics configuration choices.

Generating a call transcript through Amazon Chime SDK call analytics with Amazon Transcribe Call Analytics enabled

This approach is applicable to call analytics customers who use transcription through the integration with Amazon Transcribe or Amazon Transcribe Call Analytics. You can attach Amazon Chime SDK call analytics on an Amazon Chime SDK Voice Connector with a configuration for Amazon Transcribe Call Analytics. With the analytics option, call transcripts are automatically generated and stored within an S3 data lake in your account. Given a call with a unique transaction ID obtained from an EventBridge notification, the transcript can be easily extracted from the data lake via an Amazon Athena SQL query, as illustrated in the accompanying notebook. This transcript can be embedded in the summarization prompt to the LLM.

Generating a call transcript via an AWS Lambda function that calls Amazon Transcribe on a call recording generated through Amazon Chime SDK call analytics

This approach is applicable to Amazon Chime SDK call analytics customers who have only enabled call recording (and not Amazon Transcribe services). In this approach, you turn on call recording on an Amazon Chime SDK Voice Connector, and define an AWS Lambda function that begins transcribing a call when a call recording is delivered into the data lake. The transcript from this series of steps can be embedded in the summarization prompt to the LLM.

We describe this approach in more detail below.

Telephony and Recording

Now that we have a model that we can use to summarize calls, let’s see how we can use this with your existing telephony infrastructure.  This blog will assume that you have existing telephony infrastructure that can create a Session Initiation Protocol (SIP) based media recording (SIPREC) outlined here: RFC 7866: Session Recording Protocol.  If you don’t have something that can create that SIPREC session, the demo can be deployed with a PBX that can be used to test with, or files can be directly uploaded to an Amazon Simple Storage Service (Amazon S3) bucket.

In this demo, we will deploy a serverless application that uses Amazon Chime SDK call analytics to record calls to an S3 bucket.

When the SIPREC or PSTN session has started and Real-time Transport Protocol (RTP) is delivered to the Amazon Chime SDK Voice Connector, a notification will be sent to either Amazon EventBridge, Amazon Simple Notification Service (SNS), or Amazon Simple Queue Service (SQS).  In this demo, we will be using the Amazon Chime SDK call analytics configuration associated with the Amazon Chime SDK Voice Connector for recording the call.  When a call arrives, Amazon Chime SDK call analytics will take all of the RTP that has been delivered to the Amazon Chime SDK Voice Connector and write it to the designated S3 bucket using the callId as the key name.  In this demo, we will record all calls that are observed by the Amazon Chime SDK Voice Connector, however, customization can be applied at this step.  For example, some calls can be ignored, or calls for one group can be delivered to one S3 bucket, while calls for another group are delivered to a different S3 bucket.  To ensure that the entire duration of the call is recorded, data retention for the Amazon Kinesis Video Stream has been set to 24 hours.

Transcribing and Summarizing

Now that we have a call recording saved to an S3 bucket as a wav file, we can begin the process of transcribing and summarizing the contents of that call.

When the wav file is delivered to the S3 bucket by the Amazon Chime SDK call analytics recorder, it will trigger a Lambda function to begin transcribing the call.  In this case, we are using Amazon Transcribe to produce the transcript that is then delivered to the transcribeOutput prefix of the S3 bucket we are using.  This in turn triggers another Lambda function that will begin the call summarization process.

With the SageMaker endpoint active, a request will be made using the output of the transcription along with a prompt.  The included example prompt is “What is the customer calling about and what are the next steps?”  This prompt can be customized to your specific use case.  The result of this request will be written to the bucket in the summaryOutput prefix as a JSON file.

This information can be captured and displayed in a customized User Interface.  For example:

Details for deploying this demo can be found in the associated GitHub repository.

Conclusion

In this blog, we have explained how to use an LLM with Amazon Chime SDK call analytics to summarize calls.  Please examine our GitHub repository to learn more. After the notebook and code examples for this blog were developed, Amazon announced Amazon Bedrock and the associated Titan LLMs.  Stay tuned for examples on how to integrate Bedrock with Amazon Chime SDK call analytics.

Court Schuett

Court Schuett

Court Schuett is the Lead Evangelist for the Amazon Chime SDK with a background in telephony and now loves to build things that build things. Court is focused on teaching developers and non-developers alike how to build with AWS.

Narasimha Chari

Narasimha Chari

Chari is a Principal Product Manager for the Amazon Chime SDK service team where he focuses on machine learning applications to audio and video communications and analytics. Outside of work, Chari enjoys spending time with his family, and going for runs in the hills.

Umut Isik

Umut Isik

Umut is a Principal Scientist at AWS. He works on Generative AI applications to meetings and phone calls.