Amazon Transcribe Now Generally Available

Update (August 31, 2021) – Removed outdated S3 URL in the code.

At AWS re:Invent 2017 we launched Amazon Transcribe in private preview. Today we’re excited to make Amazon Transcribe generally available for all developers. Amazon Transcribe is an automatic speech recognition service (ASR) that makes it easy for developers to add speech to text capabilities to their applications. We’ve iterated on customer feedback in the preview to make a number of enhancements to Amazon Transcribe.

New Amazon Transcribe Features in GA

To start off we’ve made the SampleRate parameter optional which means you only need to know the file type of your media and the input language. We’ve added two new features – the ability to differentiate multiple speakers in the audio to provide more intelligible transcripts (“who spoke when”), and a custom vocabulary to improve the accuracy of speech recognition for product names, industry-specific terminology, or names of individuals. To refresh our memories on how Amazon Transcribe works lets look at a quick example.

import boto3
transcribe = boto3.client("transcribe")
transcribe.start_transcription_job(
    TranscriptionJobName="TranscribeDemo",
    LanguageCode="en-US",
    MediaFormat="mp3",
    Media={"MediaFileUri": "https://s3.amazonaws.com/<Your-Bucket>/out.mp3"}
)

This will output JSON similar to this (I’ve stripped out most of the response) with individual speakers identified:

{
  "jobName": "reinvent",
  "accountId": "1234",
  "results": {
    "transcripts": [
      {
        "transcript": "Hi, everybody, i'm randall ..."
      }
    ],
    "speaker_labels": {
      "speakers": 2,
      "segments": [
        {
          "start_time": "0.000000",
          "speaker_label": "spk_0",
          "end_time": "0.010",
          "items": []
        },
        {
          "start_time": "0.010000",
          "speaker_label": "spk_1",
          "end_time": "4.990",
          "items": [
            {
              "start_time": "1.000",
              "speaker_label": "spk_1",
              "end_time": "1.190"
            },
            {
              "start_time": "1.190",
              "speaker_label": "spk_1",
              "end_time": "1.700"
            }
          ]
        }
      ]
    },
    "items": [
      {
        "start_time": "1.000",
        "end_time": "1.190",
        "alternatives": [
          {
            "confidence": "0.9971",
            "content": "Hi"
          }
        ],
        "type": "pronunciation"
      },
      {
        "alternatives": [
          {
            "content": ","
          }
        ],
        "type": "punctuation"
      },
      {
        "start_time": "1.190",
        "end_time": "1.700",
        "alternatives": [
          {
            "confidence": "1.0000",
            "content": "everybody"
          }
        ],
        "type": "pronunciation"
      }
    ]
  },
  "status": "COMPLETED"
}

Custom Vocabulary

Now if I needed to have a more complex technical discussion with a colleague I could create a custom vocabulary. A custom vocabulary is specified as an array of strings passed to the CreateVocabulary API and you can include your custom vocabulary in a transcription job by passing in the name as part of the Settings in a StartTranscriptionJob API call. An individual vocabulary can be as large as 50KB and each phrase must be less than 256 characters (use hyphens for spaces). More information on custom vocabularies can be found in the documentation. If I wanted to transcribe the recordings of my highschool AP Biology class I could create a custom vocabulary in Python like this:

import boto3
transcribe = boto3.client("transcribe")
transcribe.create_vocabulary(
	LanguageCode="en-US",
	VocabularyName="APBiology",
	Phrases=[
		"endoplasmic-reticulum",
		"organelle",
		"cisternae",
		"eukaryotic",
		"ribosomes",
		"hepatocyes",
		"cell-membrane"
	]
)

I can refer to this vocabulary in my calls to the synthesize speech API by the name APBiology, and I can update the vocabulary programatically based on any errors I may find in the transcriptions.

Available Now

Amazon Transcribe is available now in US East (N. Virginia), US West (Oregon), US East (Ohio) and Europe (Ireland). Transcribe’s free tier gives you 60 minutes of transcription for free per month for the first 12 months with a pay-as-you-go model of $0.0004 per second of transcribed audio after that, with a minimum charge of 15 seconds.

When combined with other tools and services I think transcribe opens up a entirely new opportunities for application development. I’m excited to see what technologies developers build with this new service.

– Randall

AWS News Blog

Amazon Transcribe Now Generally Available

New Amazon Transcribe Features in GA

Custom Vocabulary

Available Now

Resources

Follow