AWS Developer Tools Blog
Announcing Amazon Transcribe streaming transcription support in the AWS SDK for Ruby
Amazon Transcribe streaming transcription enables you to send an audio stream, and with a single API call, receive a stream of text in real time. We’re excited to announce support for the #start_stream_transcription
API with bidirectional streaming usage in the AWS SDK for Ruby.
Before calling #start_stream_transcription
To use the Amazon Transcribe #start_stream_transcription API, you need to have http-2
gem and aws-sdk-transcribestreamingservice
gem available, as follows.
The Amazon Transcribe #start_stream_transcription
API enables you to send an audio stream and receive a stream of text in real time. Although the AWS SDK for Ruby supports all Ruby versions later than 1.9.3, this API is streamed over the HTTP2 protocol. This means to use the API, you need to have Ruby version 2.1 or later.
To check your Ruby version, run the following.
Currently, Amazon Transcribe supports both 16 kHz and 8 kHz audio streams (WAV, MP3, MP4, and FLAC) in 16-bit linear PCM encoding. Make sure your audio stream is under supported sample rates and within supported encoding before trying out the API, or you might get back empty transcripts or bad request exceptions.
You can find more FAQs on Amazon Transcribe streaming transcription here.
#start_stream_transcription API usage pattern
Let’s walk through the key parts for making an async API call from an async client and event stream handlers, and a complete example of using the API.
Introduction to AsyncClient
Following the nature of HTTP2, the AWS SDK for Ruby introduces AsyncClient
for streaming APIs, compared to Client
(which you might be familiar with) for API calls over HTTP1.1.
Introduction to input and output event stream handlers
For a bidirectional streaming API, you need to provide an :input_event_stream_handler
for signaling audio events, and an :output_event_stream_handler
registered with callbacks to process events immediately when they arrive.
You can find all of the available event streams for those handlers, and documentation about them, at Aws::TranscribeStreamingService::EventStreams
.
Before we make the request, let’s take a closer look at those handlers. For handling events in responses, although you still can #wait
or #join!
for a final sync response, you get the most benefit out of streaming APIs on HTTP2 by registering callbacks on output_stream to access events with no delay.
You can find all of the available callback methods for output_stream
in the Aws::TranscribeStreamingService::EventStreams::TranscriptResultStream
documentation.
Then, when it comes to using input_stream
, you can #signal
audio events after initializing an async request.
Calling the API
For a complete example to demo, we’re using an AWS Podcast audio here to show how we use the #start_stream_transcription
API to get real-time transcripts streamed back.
Let’s pick AWS Podcast #285, which talks about AWS Lambda support for the native Ruby runtime and more.
First, download the file and convert the audio to 16kHz rate with 16-bit linear PCM encoding, with the name AwsPodCast285.wav
.
Now we’re set to call the API. Let’s create a demo.rb
file as follows.
Running the code produces the following.
For full documentation of how to use this API, see the AWS SDK for Ruby API Reference.
Additional notes
Due to the nature of the HTTP2 protocol, request and response happens in parallel, and multiple streams share a single connection. Although you have full control of the speed of signaling audio events from input event streams, when the signal speed is too fast, with huge audio chunks, the bandwidths left for responding to events could be narrowed. To get the most from bidirectional streaming, we recommend a balanced pace in signaling events at input streams.
We recommend calling #signal_end_stream
at the input event stream handler after audio event signaling is completed as a good practice. It sends a clear “end” stream signal to the server side. Some services might be waiting for this “end” stream signal to complete stream communication. If no further audio event is sent and no end stream is signaled, a :bad_request_exception
event might also be returned.
As you might have noticed, different from sync HTTP1.1 API calls, the AsyncResponse
object is returned immediately once an async API call is made. There are two methods for syncing an AsyncResponse
: #wait
and #join!
. The #wait
method would wait on the request until the stream is closed, which can take minutes or even hours (depending on input event signaling). However, when #join!
is called, it would end the stream immediately with no delay.
We also provide #close_connection
and #new_connection
methods for an AsyncClient
, as connection will be shared across multiple requests (streams), we recommend calling #close_connection
when you finished syncing all async responses. By default connection will be closed after 60 sec if no errors occurred when no data is received, you can configure this value by :connection_timeout
.
Final thoughts
We walked through async API usage in this blog post and provided some best practices. Although async API usage is new and different from sync API calls in the AWS SDK for Ruby, it’s bringing streaming benefits for many use cases. Free feel to give it a try and let us know if you have any questions.
Feedback
Please share your questions, comments, and issues with us on GitHub. You can also catch us in our Gitter channel.