Business Productivity

Get higher quality call recordings using Amazon Chime SDK call analytics

Voice enhancement for call recordings using machine-learning-powered denoising and a new speech superresolution model

Why is it important to have higher quality audio recordings?

Organizations record phone calls and archive call recordings for a variety of reasons including regulatory compliance and customer service quality control. For example, in many regulated industries, such as financial services, organizations are required to comprehensively record and archive calls. Contact centers may record calls for quality assessment, and quality management staff can often spend several hours a day listening to call recordings.

For these use cases where call recordings are extensively human-monitored, standard telephony audio has two key drawbacks. First, PSTN audio transmission uses a narrowband format; frequencies below 300Hz and above 3400Hz are filtered out, removing the naturalness and fullness of the voice and resulting in a muffled quality. Second, either end of a phone call may be corrupted by noise such as traffic, office equipment, or background babble, which can detract from the ability of a listener to focus on the conversation. In combination, these drawbacks can lead to listening fatigue and reduced productivity for employees such as contact center managers and compliance officers who often have to review hours of recordings each day.

What is voice enhancement for call recordings?

Amazon Chime SDK call analytics now includes voice enhancement for call recordings. This optional new capability transforms narrowband telephony audio recordings into higher quality recordings using machine learning. It employs a deep neural network (DNN) architecture that has been trained to help reduce noise and to restore the low- and high-frequency content that has been removed from the voice signal by the telephony network. The end result is a clearer and higher definition audio recording and an improved listening experience. The denoising capability is based on our award-winning Amazon Voice Focus technology and the frequency restoration uses a new speech superresolution approach we recently developed.

Figure 1: Schematic representation of the energy spectra of narrowband and wideband speech.

Figure 1: Schematic representation of the energy spectra of narrowband and wideband speech.

Figure 1 depicts typical energy spectra for narrowband and wideband speech signals. The superresolution model takes narrowband speech content containing frequency components between 300Hz and 3.4kHz as input. From the narrowband signal, the model generates frequency content between 0 and 300Hz and between 3.4kHz and 8kHz. This additional frequency content is added to the narrowband input content to form an output wideband signal.

Examples demonstrating the improvement in audio quality

The examples below demonstrate the improvement that can be achieved by voice enhancement of call recordings. Each of the examples on the left is a telephony signal with narrowband frequency content as illustrated by the spectrogram. Each of the examples on the right is our enhanced version, where the spectrogram shows that the higher and lower frequency content missing from the narrowband telephony signal have been generated by our model. We recommend listening to these examples using a high-quality set of headphones to fully appreciate the difference in audio fidelity. It is important to note that these are merely exemplary, and that the level of improvement will vary depending on the use case, the recording and playback setups, ambient noise, and other factors.

Example 1: Input call audio

Example 1: Enhanced call recording

Example 2: Input call audio

Example 2: Enhanced call recording

How to get started with voice enhancement for recordings

Voice enhancement for recordings is now available to Amazon Chime SDK call analytics customers using the call recording feature in US East (N. Virginia) and US West (Oregon) AWS Regions. The voice enhancement capability is optional and you can enable it via the Amazon Chime SDK call analytics APIs or via the AWS console, provided call recording is enabled.

You can use the Amazon Chime SDK console to enable voice enhancement for recordings. For example, you can edit an existing call analytics configuration that includes recording, and select the check box to “Activate voice enhancement”.

Activating voice enhancement for recordings from the Amazon Chime SDK console

Figure 3: Activating voice enhancement for recordings from the Amazon Chime SDK console

Alternatively, you can enable voice enhancement programmatically using APIs. For example, if you have an existing call analytics recording configuration, you can activate voice enhancements for call recordings by using the UpdateMediaInsightsPipelineConfiguration API and adding a VoiceEnhancementSinkConfiguration element. You should also configure S3RecordingSink and s3:GetObject permissions in the Call Analytics Resource Role to store the enhanced recordings.

After your call ends, an enhanced audio file (with the suffix “_enhanced.wav” or “_enhanced.ogg”) will be generated. It is important to note that the original recording is preserved and presented alongside the enhanced recording, so that customers will also have access to the unmodified original recording. The enhanced recording and the original audio recording will both be stored in the same Amazon S3 bucket and format. Amazon S3 storage costs will apply to both the original and enhanced recordings.

Learn more

To learn more about Amazon Chime SDK call analytics and recording, and voice enhancement, review the following resources:

Narasimha Chari

Narasimha Chari

Chari is a Principal Product Manager for the Amazon Chime SDK service team where he focuses on machine learning applications to audio and video communications and analytics. Outside of work, Chari enjoys spending time with his family, and going for runs in the hills.

Mike Goodwin

Mike Goodwin

Mike is an Applied Science Senior Manager for the Amazon Chime SDK. His team focuses on machine learning and signal processing solutions for audio and video workloads. In his spare time he enjoys running, kayaking, and playing guitar.

Erfan Soltanmohammadi

Erfan Soltanmohammadi

Erfan is an Applied Scientist for the Amazon Chime SDK with a passion for developing machine learning and signal processing solutions that enhance audio quality for real-time communication systems.