Asynchronous Amazon Transcribe Streaming SDK for Python (Preview)

We are pleased to announce the first preview release of an asynchronous Amazon Transcribe streaming SDK for Python. Amazon Transcribe streaming transcription enables you to send an audio stream and receive a stream of text in real time. This initial preview release of the SDK provides simple and easy to use interfaces for the Amazon Transcribe streaming APIs in Python. Let’s dive into a code sample.

Example: Transcribing Audio in Real Time from a Local File

Prerequisites

To follow along with this sample code you’ll need to be using a recent version of Python (3.6+), an AWS account, and the following Python libraries:

python -m pip install amazon-transcribe aiofile

The first dependency is the Amazon Transcribe Streaming SDK for Python. The second is aiofile, a library that gives us an asynchronus interface to the filesystem.

Constructing a Client

We’ll begin by constructing an SDK client for Transcribe Streaming in our desired region:

from amazon_transcribe.client import TranscribeStreamingClient

# Setup up our client with our chosen AWS region
client = TranscribeStreamingClient(region="us-west-2")

Next, we can initiate a transcription stream by calling the start_stream_transcription API.

# Start transcription to generate our async stream
stream = await client.start_stream_transcription(
    language_code="en-US",
    media_sample_rate_hz=16000,
    media_encoding="pcm",
)

Now that we have a handle to a transcription stream we can begin to write audio events into it. We’ll define an asynchronous function that writes the contents of the audio file into the stream.

Note: The sample audio file is available in the GitHub repository for the SDK.

import aiofile

async def write_chunks():
    # An example file can be found at tests/integration/assets/test.wav
    async with aiofile.AIOFile('tests/integration/assets/test.wav', 'rb') as afp:
        reader = aiofile.Reader(afp, chunk_size=1024 * 16)
        async for chunk in reader:
            await stream.input_stream.send_audio_event(audio_chunk=chunk)
    await stream.input_stream.end_stream()

With the input side of the stream handled, let’s move on to the output side. While it’s possible to read the events of the output stream by using it as an asynchronous generator, we’ll implement the provided event handler interface for this stream type which defines callbacks to take care of specific event types.

from amazon_transcribe.handlers import TranscriptResultStreamHandler
from amazon_transcribe.model import TranscriptEvent

class MyEventHandler(TranscriptResultStreamHandler):
    async def handle_transcript_event(self, transcript_event: TranscriptEvent):
        # This handler can be implemented to handle transcriptions as needed.
        # In this case, we're simply printing all of the returned results
        results = transcript_event.transcript.results
        for result in results:
            for alt in result.alternatives:
                print(alt.transcript)

Now that both the input and output of the stream are handled we can instantiate our handler class and instruct asyncio to simulatenously execute our read and write handlers.

import asyncio

# Instantiate our handler and start processing events
handler = MyEventHandler(stream.output_stream)
await asyncio.gather(write_chunks(), handler.handle_events())

And that’s all it takes to begin sending audio and receiving transcriptions in real time from Python. For a complete sample file of the code in this example and other example programs see the examples on GitHub.

Feedback

Now that we’ve seen a real example let’s talk about some of the underlying technologies used to make this SDK possible:

The AWS Common Runtime provides Python bindings to an asynchronous HTTP/2 implementation in C, as well as many other interfaces crucial to an SDK such as AWS SigV4 signing, and credential resolution.

This SDK also heavily leverages static typing via mypy which allows for improved IDE autocompletion, documentation, and static type checking.

Last but not least, the provided interfaces are all asynchronous and provided to be used with the standard library asynchronous I/O framework included with Python asyncio.

These inclusions are all firsts for the Python SDK team and we would greatly appreciate any feedback on the technologies, interfaces, documentation, and overall experience we’re providing with this preview SDK. While the SDK is in preview we have the most flexibility to improve the library and we can’t do it without your feedback. Any feedback can be provided via the issue tracker on GitHub.

Conclusion

We hope this short post and example have shown the functionality of this preview Python SDK for Amazon Transcribe Streaming and you’re as excited about the future of the Python SDK as we are. We look forward to hearing your feedback and all the great things being built with this!