Amazon Transcribe Documentation

Amazon Transcribe is an automatic speech recognition service that makes it easy to add speech to text capabilities to any application. Transcribe’s features enable you to ingest audio input, produce easy to read and review transcripts, improve accuracy with customization, and filter content to help ensure customer privacy.

Audio inputs

Transcribe is designed to process live and recorded audio or video input to provide high quality transcriptions for search and analysis. We also offer services that are designed specifically to understand customer calls (Amazon Transcribe Call Analytics) and medical conversations (Amazon Transcribe Medical).

Streaming & batch transcription

You can process your existing audio recordings or stream the audio for real-time transcription. Using a secure connection, you can send a live audio stream to the service, and receive a stream of text in response.

Domain specific models

Select a model that is tuned to telephone calls or multimedia video content. For example, Transcribe adapts to low-fidelity phone audio common in contact centers.

Automatic language identification

With Amazon Transcribe, you can identify the dominant language in an audio file and generate transcriptions. This is useful when your media library contains audio files in different languages. You can also use this feature for media content classification and verify that the main spoken language in your videos and podcasts is correctly labeled.

Easy to read transcripts

Amazon Transcribe enables you to produce accurate transcripts that are easy to read, review, and integrate into your specific applications. 

Punctuation & number normalization

Amazon Transcribe adds punctuation and number formatting to produce high-quality and easily readable transcriptions. 

Timestamp generation

Amazon Transcribe returns a timestamp for each word, so that you can easily find a word or phrase in the original recording or add subtitles to video.

Recognize multiple speakers

Amazon Transcribe can recognize and attribute speaker changes in the text to capture scenarios like telephone calls, meetings, and television shows more accurately. 

Channel identification

Contact centers can submit a single audio file to Amazon Transcribe, and the service can produce a single transcript annotated by channel labels.

Customize your output

Accuracy is critical and we provide you many options to customize transcripts to your specific business needs and vernacular. Transcribe also provides alternative transcriptions for each sentence, so you can quickly choose the best option that applies to your content and domain. This is useful for human in-the-loop subtitling workflows.

Custom vocabulary

With custom vocabulary, you can add new words to the base vocabulary to generate more accurate transcriptions for domain-specific words and phrases like product names, technical terminology, or names of individuals.

Custom language models

When needed, you can build and train your own custom language model (CLM) for your use case and domain by submitting a corpus of text data to Amazon Transcribe. CLM is a suitable feature for enhancing speech recognition accuracy with your own data.

User safety & privacy features

Transcribe can help you mask or remove words that are sensitive or unsuitable for your audience from transcription results.

Vocabulary filtering

You can specify a list of words to remove from transcripts with vocabulary filtering. For example, you can specify a list of profane or offensive words and Amazon Transcribe can remove them from transcripts.

Automatic content redaction

When instructed, Amazon Transcribe can help customers identify and redact sensitive personally identifiable information (PII) from the supported language transcripts. This allows contact centers to more easily review and share the transcripts for customer experience insight and agent training.

Amazon Transcribe Call Analytics

Extract conversation insights like call sentiment and speech loudness to improve agent productivity and customer experience with Amazon Transcribe Call Analytics.

Extract detailed call analytics & conversation insights

Using the power of machine learning, you can quickly apply speech-to-text and natural language processing capabilities to uncover valuable conversation insights. You can then integrate insights such as customer and agent sentiment, detected issues, and speech characteristics like non-talk time, interruptions, and talk-speed into your inbound and outbound call analytics applications. This can help your supervisors more readily identify potential customer issues, agent coaching opportunities, and call trends.

Improve compliance & monitoring with automated call categorization

Monitor your calls at scale to help track compliance with company policies or regulatory requirements. Build and train your own custom categories based on your specified criteria (e.g. words/phrases or conversation characteristics). 

Produce rich call transcripts

Give your agents access to the conversation details from past interactions. The turn-by-turn transcripts provide insights such as customer sentiment, detected issues and interruptions.

Protect sensitive customer data

Conversations often contain sensitive customer data such as names, addresses, credit card numbers, and social security numbers. Transcribe Call Analytics helps customers identify and redact this information from both the audio and text.

Amazon Transcribe Medical

Easily transcribe your medical conversations with Transcribe Medical, a HIPAA-eligible automatic speech recognition (ASR) service.

Dictation mode

Transcribe single-speaker audio commonly found in medical dictation use cases. 

Conversational mode

Transcribe multi-speaker conversational audio consisting of clinicians and/or patients alike.  

Medical specialties

Transcribe speech to text across a diverse range of medical specialties. 

Batch API

Transcribe recorded medical audio files at scale with high concurrency. 

Streaming API

Transcribe audio streams in near real time.

Custom vocabulary

Boost transcription accuracy by using custom vocabulary for potentially out-of-lexicon terminology.  

Channel identification

Concurrently transcribe multi-channel audio and get one final coherent transcript. 

Speaker diarization

Separate speech from different speakers within any mono-channel audio. 

Additional Information

For additional information about service controls, security features and functionalities, including, as applicable, information about storing, retrieving, modifying, restricting, and deleting data, please see https://docs.aws.amazon.com/index.html. This additional information does not form part of the Documentation for purposes of the AWS Customer Agreement available at http://aws.amazon.com/agreement, or other agreement between you and AWS governing your use of AWS’s services.