Amazon SageMaker AI Inference now supports bidirectional streaming

Posted on: Nov 25, 2025

Amazon SageMaker AI Inference now supports bidirectional streaming for real-time speech-to-text transcription, enabling continuous speech processing instead of batch input. Models can now receive audio streams and return partial transcripts simultaneously as users speak, enabling you to build voice agents that process speech with minimal latency.

As customers build AI voice agents, they need real-time speech transcription to minimize delays between user speech and agent responses. Data scientists and ML engineers lack managed infrastructure for bidirectional streaming, making it necessary to build custom WebSocket implementations and manage streaming protocols. Teams spend weeks developing and maintaining this infrastructure rather than focusing on model accuracy and agent capabilities. With bidirectional streaming on Amazon SageMaker AI Inference, you can deploy speech-to-text models by invoking your endpoint with the new Bidirectional Stream API. The client opens an HTTP2 connection to the SageMaker AI runtime, and SageMaker AI automatically creates a WebSocket connection to your container. This can process streaming audio frames and return partial transcripts as they are produced. Any container implementing a WebSocket handler following the SageMaker AI contract works automatically, with real-time speech models such as Deepgram running without modifications. This eliminates months of infrastructure development, enabling you to deploy voice agents with continuous transcription while focusing your time on improving model performance.

Bidirectional streaming is available in following AWS Regions - Canada (Central), South America (São Paulo), Africa (Cape Town), Europe (Paris), Asia Pacific (Hyderabad), Asia Pacific (Jakarta), Israel (Tel Aviv), Europe (Zurich), Asia Pacific (Tokyo), AWS GovCloud US (West), AWS GovCloud US (East), Asia Pacific (Mumbai), Middle East (Bahrain), US West (Oregon), China (Ningxia), US West (Northern California), Asia Pacific (Sydney), Europe (London), Asia Pacific (Seoul), US East (N. Virginia), Asia Pacific (Hong Kong), US East (Ohio), China (Beijing), Europe (Stockholm), Europe (Ireland), Middle East (UAE), Asia Pacific (Osaka), Asia Pacific (Melbourne), Europe (Spain), Europe (Frankfurt), Europe (Milan), Asia Pacific (Singapore).

To learn more, visit AWS News Blog here and SageMaker AI documentation here.

Amazon SageMaker AI Inference now supports bidirectional streaming

Learn

Resources

Developers

Help