Q: What is Amazon Transcribe?
Amazon Transcribe is an AWS service that makes it easy for customers to convert speech-to-text. Using Automatic Speech Recognition (ASR) technology, customers can choose to use Amazon Transcribe for a variety of business applications, including transcription of voice-based customer service calls, generation of subtitles on audio/video content, and conduct (text based) content analysis on audio/video content.
Q: How does Amazon Transcribe interact with other AWS products?
Amazon Transcribe converts audio input into text, which opens the door for various text analytics applications on voice input. For instance, by using Amazon Comprehend on the converted text data from Amazon Transcribe, customers can perform sentiment analysis or extract entities and key phrases. Similarly, by integrating with Amazon Translate and Amazon Polly, customers can accept voice input in one language, translate it into another and generate voice output, effectively enabling multi-lingual conversations. It is also possible to integrate Amazon Transcribe with Amazon Elasticsearch to index and perform text based search across audio/video library.
Q: What else should I know before using Amazon Transcribe service?
Amazon Transcribe service is designed to handle a wide range of speech and acoustic characteristics, including variations in volume, pitch, and speaking rate. The quality and content of the audio signal (including but not limited to factors such as background noise, overlapping speakers, accented speech, or switches between languages within a single audio file) may affect the accuracy of service output. We are constantly updating the service to improve its ability to accommodate additional acoustic variation and content types.
Using Amazon Transcribe
Q: How will developers access Transcribe?
The easiest way to get started with Amazon Transcribe is to submit a job using the console to transcribe an audio file. You can also call the service directly from the AWS Command Line Interface, or use one of the supported SDKs of your choice to integrate with your applications. Either way, you can start using Amazon Transcribe to generate automated transcripts for your audio files with just a few lines of code.
Q: What kind of inputs does Amazon Transcribe support?
Amazon Transcribe supports both 16 kHz and 8kHz audio streams, and multiple audio encodings, including WAV, MP3, MP4 and FLAC.
Q: Does Amazon Transcribe support real-time transcriptions?
Yes. Amazon Transcribe enables users to open a bidirectional stream over HTTP2. Users can send an audio stream to the service while receiving a text stream in return in real time.
Q: What encoding does real-time transcription support?
Streaming transcription currently supports 16-bit Linear PCM encoding.
Q: What languages does Amazon Transcribe support?
For information on language support, please refer to this documentation page.
Q: What devices does Amazon Transcribe work with?
Amazon Transcribe for the most part is device agnostic. In general, Amazon Transcribe works with any device that includes an on-device microphone such as phones, PCs, tablets, and IoT devices (e.g. car audio systems). Amazon Transcribe API will be able to detect the quality of the audio stream being input at the device (8kHz VS 16kHz) and will appropriately select the acoustic models for converting speech-to-text. Furthermore, developers can call Transcribe API through their applications to access speech-to-text conversion capability.
Q: Are there size restrictions on the audio content that Amazon Transcribe can process?
Amazon Transcribe service calls are limited to 4 hours (or 2GB) per API call for our batch service. The streaming service can accommodate open connections up to 4 hours long.
Q: What programming languages does Amazon Transcribe support?
Amazon Transcribe real-time service supports Java SDK, Ruby SDK, and C++ SDK. Additional SDK support are coming. For more details, visit the Resources page.
Q: My custom vocabulary words are not being recognized! What can I do?
The speech recognition output depends on a number of factors in addition to custom vocabulary entries, so there can be no assurance that if a term is included in the custom vocabulary, it will be correctly recognized.
However, the most frequent reason is that a custom word lacks the correct pronunciation. If you haven’t provided a pronunciation for your custom word, please try to create one. If you already have provided one, double-check its correctness, or include other pronunciation variants if necessary. This can be done by creating multiple entries in the custom vocabulary file that differ in the pronunciation field.
Q: Why do I see too many custom words in my output?
Custom vocabularies are optimized for a small list of targeted words; larger vocabularies may lead to over-generation of custom words, especially when they contain words that are pronounced in a similar way. If you have a large list, please try reducing it to rare words and words that are actually expected to occur in your audio files. If you have a large vocabulary covering multiple use cases, split it into separate lists for different use cases. The words that are short and sound similar to many other words, may lead to over-generation (too many custom words appearing in the output). It is preferable to combine these words with surrounding words and list them as hyphen-separated phrases. For example, the custom word “A.D.” could be included as part of a phrase such as ‘A.D.-converter’.
Q: There are two ways of giving pronunciations, IPA or SoundsLike fields in the custom vocabulary table. Which one is better?
IPA allows for more precise pronunciations. You should provide IPA pronunciations if you are able to generate IPA (e.g., from a lexicon that has IPA pronunciations or an online converter tool).
Q: I'd like to use IPA but I'm not a linguistic expert. Is there an online tool I can use?
Several standard dictionaries, such as the Oxford English Dictionary or the Cambridge Dictionary (including their online versions) provide pronunciations in IPA. There are also online converters (e.g. easypronunciation.com or tophonetics.com for English) — however, note that in most cases these tools are based on underlying dictionaries and may not generate correct IPA for some words, such as proper names. Amazon Transcribe does not endorse any third-party tools.
Q: Do I need to use different IPA standards that are specific to a different accents of the same language? (e.g. US English versus British English)?
You should use the IPA standard that is appropriate for the audio files you will be processing — e.g., if you are expecting to process audio from British English speakers, use the British English pronunciation standard. The set of allowed IPA symbols may differ for the different languages and dialects supported by Amazon Transcribe; please make sure that your pronunciations contain only the allowed characters. Details on the IPA character sets can be found in the documentation: https://docs.aws.amazon.com/transcribe/latest/dg/how-vocabulary.html#charsets
Q: How can I provide the pronunciation using SoundsLike field in the custom vocabulary table?
You can break a word or phrase down into smaller pieces and provide a pronunciation for each piece using the standard orthography of the language to mimic the way that the word sounds. For example, in English you can provide pronunciation hints for the phrase Los-Angeles like this: loss-ann-gel-es. The hint for the word Etienne would look like this: eh-tee-en. You separate each part of the hint with a hyphen (-). You can use any of the allowed characters for the input language.
Q: How do two different ways of providing acronyms (with periods and without periods but with pronunciations) work?
If you use an acronym containing periods, the spelling pronunciation will be generated internally. If you do not use periods, please provide the pronunciation in the pronunciation field. For some acronyms, it is not obvious whether they have a spelling pronunciation or a word-like pronunciation (e.g., NATO is often pronounced ‘n eɪ t oʊ’ (nay-toh) rather than ‘ɛn eɪ ti oʊ’ (N. A. T. O.)).
Q: Where can I find examples of how to use custom pronunciations?
You can find sample input formats and examples in the documentation: https://docs.aws.amazon.com/transcribe/latest/dg/how-vocabulary.html.
Q: What happens if I use the wrong IPA? If I am uncertain, am I better off not inputting any IPA?
The system will use the pronunciation you provide; this should increase the likelihood of the word being recognized correctly if the pronunciation is correct and matches what was spoken. If you are not certain you are generating correct IPA, please run a comparison by processing your audio files with a vocabulary that contains your IPA pronunciations, and with a vocabulary that only contains the words (and, optionally, display-as forms). If you do not provide any pronunciations the service will use an approximation, which may or may not work better than your input.
Q: When using DisplayAs forms, can I display character sets unrelated to the original language being transcribed? (e.g. output “Street” as “街道“).
Yes. While phrases may only use a restricted set of characters for the specific language, UTF-8 characters apart from \t (TAB) are permitted in the DisplayAs column.
Q. Are voice inputs processed by Amazon Transcribe stored, and how are they used by AWS?
Amazon Transcribe may store and use voice inputs processed by the service solely to provide and maintain the service and to improve and develop the quality of Amazon Transcribe and other Amazon machine-learning/artificial-intelligence technologies. Use of your content is important for continuous improvement of your Amazon Transcribe customer experience, including the development and training of related technologies. We do not use any personally identifiable information that may be contained in your content to target products, services, or marketing to you or your end users. Your trust, privacy, and the security of your content are our highest priority, and we implement appropriate and sophisticated technical and physical controls, including encryption at rest and in transit, designed to prevent unauthorized access to, or disclosure of, your content and ensure that our use complies with our commitments to you. Please see https://aws.amazon.com/compliance/data-privacy-faq/ for more information. You may opt out of having your content used to improve and develop the quality of Amazon Transcribe and other Amazon machine-learning/artificial-intelligence technologies by contacting AWS Support.
Q. Can I delete voice inputs stored by Amazon Transcribe?
Yes. You can request deletion of voice inputs associated with your account by contacting AWS Support. Deleting voice inputs may degrade your Amazon Transcribe experience.
Q: Who has access to my content that is processed and stored by Amazon Transcribe?
Only authorized employees will have access to your content that is processed by Amazon Transcribe. Your trust, privacy, and the security of your content are our highest priority, and we implement appropriate and sophisticated technical and physical controls, including encryption at rest and in transit, designed to prevent unauthorized access to, or disclosure of, your content and ensure that our use complies with our commitments to you. Please see https://aws.amazon.com/compliance/data-privacy-faq/ for more information.
Q: Do I still own my content that is processed and stored by Amazon Transcribe?
You always retain ownership of your content, and we will only use your content with your consent.
Q: Is the content processed by Amazon Transcribe moved outside the AWS region where I am using Amazon Transcribe?
Any content processed by Amazon Transcribe is encrypted and stored at rest in the AWS region where you are using Amazon Transcribe. Some portion of content processed by Amazon Transcribe may be stored in another AWS region solely in connection with the continuous improvement and development of your Amazon Transcribe customer experience and other Amazon machine-learning/artificial-intelligence technologies. If you opt out of having your content used to develop the quality of Amazon Transcribe and other Amazon machine-learning/artificial-intelligence technologies by contacting AWS Support, your content will not be stored in another AWS region. You can request deletion of voice inputs associated with your account by contacting AWS Support. Your trust, privacy, and the security of your content are our highest priority and we implement appropriate and sophisticated technical and physical controls, including encryption at rest and in transit, designed to prevent unauthorized access to, or disclosure of, your content and ensure that our use complies with our commitments to you. Please see https://aws.amazon.com/compliance/data-privacy-faq/ for more information.
Q: Can I use Amazon Transcribe in connection with websites, programs or other applications that are directed or targeted to children under age 13 and subject to the Children’s Online Privacy Protection Act (COPPA)?
Yes, subject to your compliance with the Amazon Transcribe Service Terms, including your obligation to provide any required notices and obtain any required verifiable parental consent under COPPA, you may use Amazon Transcribe in connection with websites, programs, or other applications that are directed or targeted, in whole or in part, to children under age 13.
Q: How do I determine whether my website, program, or application is subject to COPPA?
For information about the requirements of COPPA and guidance for determining whether your website, program, or other application is subject to COPPA, please refer directly to the resources provided and maintained by the United States Federal Trade Commission. This site also contains information regarding how to determine whether a service is directed or targeted, in whole or in part, to children under age 13.