Coming soon – Amazon Transcribe to Identify Speakers Based on Channels

Posted on: Jul 17, 2018

Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for you to add a speech-to-text capability to your applications. You can use Amazon Transcribe to create text transcripts of audio and video files. Coming soon, Amazon Transcribe will support a feature called channel synthesis to better handle audio where each speaker records on a different channel. For example, a stereo track with the interviewer is stored in the left and the interviewee on the right.

Contact centers stand to benefit significantly by using the channel synthesis feature as they make transcriptions of multi-channel customer call recordings. Typically, an agent and a caller are recorded on separate channels and merged into a single audio file. For instance, contact center applications, like Amazon Connect, store agents’ and customers’ stereo audio channel separately. The agent audio is stored in the right channel. All incoming audio, such as the end-customer, is stored in the left channel. Contact centers can submit the single audio file to Amazon Transcribe, which will identify the two channels, split them out, make transcriptions of each speaker per channel, and then produce a coherent merged transcript with channel labels. Using the channel labels, contact centers can now better identify and analyze what each speaker says with higher accuracy and efficiency. Moreover, customers no longer need to submit each channel’s recording as individual audio files for transcription, thus reducing both the overall cost and workload for contact centers.