Amazon Transcribe

Automatic speech recognition

Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech to text capability to their applications. Using the Amazon Transcribe API, you can analyze audio files stored in Amazon S3 and have the service return a text file of the transcribed speech.

Amazon Transcribe can be used for lots of common applications, including the transcription of customer service calls and generating subtitles on audio and video content. The service can transcribe audio files stored in common formats, like WAV and MP3, with time stamps for every word so you can easily locate the audio in the original source by searching for the text. Amazon Transcribe is continually learning and improving to keep pace with the evolution of language.

AWS re:Invent 2017 Introducing Amazon Transcribe

Key Features

Easy-to-Read Transcriptions

Most speech recognition systems output a string of text without punctuation. Amazon Transcribe uses deep learning to add punctuation and formatting automatically, so that the output is more reader-friendly and can be used without any further editing.

Support for Telephony Audio

Recorded audio from phone conversations is typically low quality. Amazon Transcribe has been specifically designed to provide high accuracy when working with telephony quality audio to enable use cases like transcribing customer service calls.  

Multiple Languages

Amazon Transcribe can automatically transcribe US English and Spanish speech. Support for more languages will be coming soon.    


Simple-to-Use API

The Amazon Transcribe API makes it easy to convert speech to text. No complicated programming is required. Just call the API with a few lines of code, and Transcribe will return the text from your audio file stored in Amazon S3.

Support for Custom Vocabulary (Coming soon)

Amazon Transcribe gives you the ability to expand and customize your speech recognition vocabulary. You can add new words (along with their pronunciations) to the base vocabulary and generate highly-accurate transcriptions specific to your use case, even when the utterances may include specialized terminology and jargon, or unique product names. This feature helps you save time and additional editing by removing the need for corrections down the road.

Timestamp Generation

Amazon Transcribe returns a timestamp for each word, so that you can easily locate the audio in the original recording by search for the text.


Recognize Multiple Speakers (Coming soon)

Amazon Transcribe is able to recognize when the speaker changes and attribute the transcribed text appropriately. This can significantly reduce the amount of work needed to transcribe audio with multiple speakers like telephone calls, interviews, and television shows.

Use Cases

Amazon Transcribe can provide transcription for a wide range of use cases including customer service, subtitling, search and compliance.

Improving Customer Service

By converting audio input into text, Amazon Transcribe lets you build text analytics applications that can search and analyze voice input. Customer contact centers can use Amazon Transcribe to transcribe voice-based interactions, and mine the data for insights using other AWS services like Amazon Comprehend to extract meaning and intent from conversations.

Captioning/Subtitling Workflows

Amazon Transcribe can help content generation and media distributors improve reach and access by automatically generating time-stamped subtitles that can be displayed along with the video content.

Cataloging Audio Archives

The service enables you to transcribe audio and video assets into fully searchable archives for compliance monitoring and risk management. Customers can use Amazon Transcribe to convert audio to text, and use Amazon ElasticSearch to index and perform text-based search across their audio/video library.

Customer References

RingDNA is an enterprise sales acceleration engine and voice communications platform. Inside sales teams use RingDNA to dramatically increase productivity, engage in smarter sales conversations, gain predictive sales insight and coach reps to success faster than ever before. 

“RingDNA is an end-to-end communications platform for sales teams. Hundreds of enterprise organizations use RingDNA to dramatically increase productivity, engage in smarter sales conversations, gain predictive sales insights, improve their win rate and coach reps to success faster than ever before. A critical component of RingDNA’s Conversation AI requires best of breed speech-to-text to deliver transcriptions of every phone call. RingDNA is excited about Amazon Transcribe since it provides high-quality speech recognition at scale, helping us to better transcribe every call to text.”

Howard Brown – CEO & Founder,  RingDNA

Isentia, headquartered in Sydney, Australia, is a leading media-intelligence provider for the Asia-Pacific region. The company operates from 18 offices across the region and supports more than 5,000 clients worldwide, including 84 of the world’s top 100 brands. Isentia’s products help customers make more informed and timely business and communication decisions.

“At Isentia, we enable customers to analyze and monitor the media coverage for their brands. We create more than 13K summaries per day from radio and TV content. With Amazon Transcribe, we can transcribe all the audio/video content that we monitor and analyze the text data with Amazon Comprehend. Features like timestamps and punctuation make it very easy for us to search through the data and drill down and present key insights for our customers to review."

Andrea Walsh - CIO, Isentia

Learn more about Amazon Transcribe pricing

Visit the pricing page
Ready to build?
Sign up for the preview
Have more questions?
Contact us