AWS Machine Learning Blog

Transcribe speech to text in real time using Amazon Transcribe with WebSocket

August 2024: This post was reviewed and updated for accuracy.

Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to applications. In November 2018, we added streaming transcriptions over HTTP/2 to Amazon Transcribe. This enabled users to pass a live audio stream to our service and, in return, receive text transcripts in real time. We are excited to share that we recently started supporting real-time transcriptions over the WebSocket protocol. WebSocket support makes streaming speech-to-text through Amazon Transcribe more accessible to a wider user base, especially for those who want to build browser or mobile-based applications.

In this blog post, we assume that you are aware of our streaming transcription service running over HTTP/2, and focus on showing you how to use the real-time offering over WebSocket. However, for reference on using HTTP/2, you can read our previous blog post, Amazon Transcribe now supports real-time transcriptions, and documentation on Transcribing streaming audio.

What is WebSocket?

WebSocket is a full-duplex communication protocol built over TCP. The protocol was standardized by the IETF as RFC 6455 in 2011. WebSocket is suitable for long-lived connectivity whereby both the server and the client can transmit data over the same connection at the same time. It is also practical for cross-domain usage. Voila! No need to worry about cross-origin resource sharing (CORS) as there would be when using HTTP.

Using Amazon Transcribe streaming with WebSocket

To use Amazon Transcribe’s StartStreamTranscriptionWebSocket API, you first need to authorize your IAM user to use the Amazon Transcribe Streaming WebSocket. Go to the AWS Management Console, navigate to Identity & Access Management (IAM), and attach the following inline policy to your user in the AWS IAM console. Please refer to “To embed an inline policy for a user or role” for instructions on how to add permissions.

{
    "Version": "2012-10-17",
    "Statement": [
        "Sid": "transcribestreaming",
        "Effect": "Allow",
        "Action": "transcribe:StartStreamTranscriptionWebSocket",
        "Resource": "*"
    ]
}

Your upgrade request should be pre-signed with your AWS credentials using the AWS Signature Version 4. The request should contain the required parameters, namely sample-rate, language code, and media-encoding. You could optionally supply vocabulary-name to use a custom vocabulary. The StartStreamTranscriptionWebSocket API supports all of the languages that Amazon Transcribe streaming supports today. After your connection is upgraded to WebSocket, you can send your audio chunks as an AudioEvent of the event-stream encoding in the binary WebSocket frame. The response you get is the transcript JSON, which would also be event-stream encoded. For more details, please refer to our technical documentation on Event stream encoding.

To demonstrate how you can power your application with Amazon Transcribe in real time with WebSocket, we have provided sample code on GitHub to build and deploy a website which takes in your account credentials and lets you start streaming in one of your preferred languages. The complete sample code is available on GitHub. We’d love to see what other cool applications you can build using Amazon Transcribe streaming with WebSocket!


About the authors

Bhaskar Bagchi is an engineer in the Amazon Transcribe service team. Outside of work, Bhaskar enjoys photography and singing.

Karan Grover is an engineer in the Amazon Transcribe service team. Outside of work, Karan enjoys hiking and is a photography enthusiast.

Paul Zhao is a Product Manager at AWS Machine Learning. He manages the Amazon Transcribe service. Outside of work, Paul is a motorcycle enthusiast and avid woodworker.

Miriam Lebowitz is a Solutions Architect, and specializes in machine learning at AWS. Outside of work, she enjoys baking, traveling, and spending quality time with friends and family.


Audit History

Last reviewed and updated in August 2024 by Miriam Lebowitz | Solutions Architect