Guidance for Similarity Search-Based Retrieval Augmented Generation (RAG) on AWS

Go to sample code

Overview

This Guidance shows how to build an advanced question-answering application using the latest AI tools from AWS and its partners. The architecture includes a database service that stores both operational data and vector data embeddings. A fully managed generative AI service creates these embeddings, which are then stored and managed alongside your most relevant documents based on their proximity to the query vector. This technique, known as Retrieval-Augmented Generation (RAG), enhances AI response accuracy and relevance. As a result, you can provide better, faster answers to your customers' questions using your own data.

Note: See disclaimer below

How it works

This architecture diagram illustrates how to process user queries and generate accurate, contextually relevant responses. It enhances a foundation model (FM) on Amazon Bedrock using Retrieval Augmented Generation (RAG); the vector search capabilities of Amazon DocumentDB and LlamaIndex enable more accurate and informed answers from a customized knowledge base.

Download the architecture diagram

Deploy with confidence

Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs.

Go to sample code

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as amany Well-Architected best practices as possible.

Amazon Bedrock and Amazon DocumentDB integrate with Amazon CloudWatch and AWS CloudTrail, offering a comprehensive monitoring, logging, and visibility approach. This integration allows you to track API activity, monitor model usage metrics and token consumption, and access other performance-related data in Amazon Bedrock. You gain visibility to over 40 key operational metrics for the Amazon DocumentDB cluster, including compute, memory, storage, query throughput, MongoDB opcounters, and active connections.

Read the Operational Excellence whitepaper

Amazon DocumentDB prioritizes security through comprehensive encryption and access control measures. It encrypts data at rest using AWS Key Management Service (AWS KMS) keys and secures data in transit with TLS. As a virtual private cloud (VPC)-only service, Amazon DocumentDB uses Amazon Virtual Private Cloud (Amazon VPC) for network isolation and access control. Role-based access control (RBAC) enables least privilege access, while AWS Identity and Access Management (IAM) policies provide granular control over user actions and resource access. For enhanced protection of sensitive information, you can implement Client-Side Field Level Encryption (CS-FLE), which uses AWS KMS to selectively encrypt data such as personally identifiable information (PII). Additionally, Amazon DocumentDB stores log data in CloudWatch, facilitating comprehensive auditing capabilities.

Read the Security whitepaper

Amazon DocumentDB offers strategic deployment and robust backup capabilities. By deploying clusters across three Availability Zones (AZs), Amazon DocumentDB ensures continuous operations even in the face of potential failures. The Multi-AZ cluster design offers high availability, with automated failovers to existing replicas completing in under 30 seconds without manual intervention. The built-in backup functionality of Amazon DocumentDB, enabled by default, supports point-in-time recovery for clusters, allowing restoration to any second within the specified retention period. This capability significantly reduces the risk of data loss and minimizes downtime. Additionally, the serverless architecture of Amazon Bedrock eliminates infrastructure management concerns, further contributing to the overall reliability of this Guidance by reducing potential points of failure and simplifying operations.

Read the Reliability whitepaper

This Guidance uses the vector search capabilities of Amazon DocumentDB, providing mechanisms for fine-tuning the query parameters. The initial configuration uses the optimized settings for vector search queries, but you can further tune the probes or efSearch parameters based on the workload traffic and query performance requirements. Increasing the probes or efSearch value improves recall but reduces speed, so you can experiment with the recommended starting point setting of sqrt(# of lists) for the probes parameter. To help ensure the cluster can handle workload spikes and meet performance service level agreements (SLAs), this Guidance relies on Amazon CloudWatch Logs and Amazon DocumentDB Performance Insights to monitor and scale the cluster, both horizontally and vertically as needed. Similarly, Amazon Bedrock integrates with CloudWatch, providing comprehensive monitoring, logging, and visibility into API activity, model usage metrics, token consumption, and other performance-related data.

Read the Performance Efficiency whitepaper

Amazon DocumentDB offers a flexible and scalable architecture, automatically scaling storage and I/O based on workload demands so you only pay for resources you actually use. Amazon DocumentDB provides both Standard and I/O-Optimized storage configurations, allowing you to choose the most cost-effective option for your specific workload requirements. To further optimize costs, you can use CloudWatch to monitor resource consumption and inform scaling decisions or storage configuration choices. Together, these options allow you to balance cost and performance based on your specific needs and usage patterns, avoiding long-term commitments when unnecessary and achieving cost savings for more stable workloads.

Read the Cost Optimization whitepaper

This combination of flexible scaling and energy-efficient hardware significantly enhances the sustainability profile of this architecture. For example, the horizontal scaling capabilities of Amazon DocumentDB allow for precise adjustment of resources, scaling in and out as needed. This approach optimizes resource usage, minimizes waste, and reduces unnecessary energy consumption. Furthermore, Amazon DocumentDB offers AWS Graviton instances, which reduce energy consumption while delivering improved performance. Amazon Bedrock complements these efforts with its serverless architecture, eliminating the need for you to manage infrastructure and thereby reducing potential resource waste.

Read the Sustainability whitepaper

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages

Guidance for Similarity Search-Based Retrieval Augmented Generation (RAG) on AWS

Overview

How it works

Deploy with confidence

Well-Architected Pillars

Disclaimer

Did you find what you were looking for today?

Learn

Resources

Developers

Help