Amazon Bedrock Knowledge Bases now supports streaming responses

Posted on: Dec 1, 2024

Amazon Bedrock Knowledge Bases offers fully-managed, end-to-end Retrieval-Augmented Generation (RAG) workflows to create highly accurate, low latency, secure, and custom GenAI applications by incorporating contextual information from your company's data sources. Today, we are announcing the support of RetrieveAndGenerateStream API in Bedrock Knowledge Bases. This new streaming API allows Bedrock Knowledge Base customers to receive the response as it is being generated by the Large Language Model (LLM), rather than waiting for the complete response.

RAG workflow involves several steps, including querying the data store, gathering relevant context, and then sending the query to a LLM for response summarization. This final step of response generation could take few seconds, depending on the latency of the underlying model used in response generation. To reduce this latency for building latency-sensitive applications, we're now offering the RetrieveAndGenerateStream API which provides the response as a stream as it is being generated by the model. This results in a reduced latency for the first response, providing users with a more seamless and responsive experience when interacting with Bedrock Knowledge Bases.

This new capability is currently supported in all existing Amazon Bedrock Knowledge Base regions. To learn more, visit the documentation.