Skip to main content

Guidance for High-Speed RAG Chatbots on AWS

Overview

Important: This Guidance requires the use of AWS Cloud9 which is no longer available to new customers. Existing customers of AWS Cloud9 can continue using and deploying this Guidance as normal.

This Guidance demonstrates how to build a high-performance Retrieval-Augmented Generation (RAG) chatbot using Amazon Aurora PostgreSQL and the pgvector open-source extension, using AWS artificial intelligence (AI) services and open-source frameworks. Pgvector is configured as a vector database, allowing efficient vector search with the Hierarchical Navigable Small World (HNSW) indexing algorithm. The chatbot allows users to upload PDF files, ask questions in natural language, and receive answers based on the file content. With the scalability, availability, and cost-effectiveness of Aurora, you can operate your natural language processing chatbot globally.

How it works

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Amazon Bedrock provides easy access to multiple FMs. In this Guidance, we use an Amazon Titan embeddings model through Amazon Bedrock to generate vector embeddings for the text chunks. For the conversational chatbot, we use the Anthropic Claude 3 LLM.

Also, a CloudFormation script is provided to create all the necessary prerequisites for this Guidance to be deployed with a single click.

Read the Operational Excellence whitepaper

AWS Secrets Manager securely stores the database user credentials, preventing unauthorized access and password tampering issues. Secrets Manager also offers additional security features like automatic secret rotation and easy secret replication across AWS Regions, as well as auditing and monitoring of secret usage.

Additionally, the Aurora storage cluster is encrypted using an AWS Key Management Service (AWS KMS) key. Together, these services reduce the risk of security breaches.

Read the Security whitepaper

Aurora, with the pgvector extension, provides vector storage and search capabilities, along with the resilient features of a relational database. The Aurora cluster stores six copies of data across three Availability Zones (AZs), providing high availability for the data. Aurora will automatically failover to a replica in another AZ, if an Availability Zone (AZ) or instance encounters a failure. Aurora also continuously backups the data to Amazon Simple Storage Service (Amazon S3).

Read the Reliability whitepaper

Amazon Titan Text Embeddings v2 is optimized for high accuracy and well suited for semantic search use. When reducing from 1,024 to 512 dimensions, Titan Text Embeddings V2 retains approximately 99% retrieval accuracy. Vectors with 256 dimensions maintain 97 percent accuracy. This means that you can save 75 percent in vector storage (from 1024 down to 256 dimensions) and keep approximately 97 percent of the accuracy provided by larger vectors.

With the Amazon Titan text embeddings model available on Amazon Bedrock, it's easy to use and switch models based on the use case. Finally, the Amazon Titan text model helps to ensure the RAG process retires the most relevant information for the LLM, leading to more accurate answers.

Read the Performance Efficiency whitepaper

Amazon Bedrock provides a choice of two pricing plans: on-demand and provisioned throughput. The on-demand model allows you to use FMs on a pay-as-you-go basis, which is cost-efficient and gives you the agility to experiment with different models based on your needs.

Read the Cost Optimization whitepaper

In this Guidance, we use temporary resources, like an AWS Cloud9 instance for the integrated development environment (IDE), instead of dedicated Amazon Elastic Compute Cloud (Amazon EC2) instances to reduce cost. Also, we are using AWS Graviton Processor instance types for the Aurora database cluster, which uses 60 percent less energy than comparable Amazon EC2 instances with the same performance.

Read the Sustainability whitepaper

Implementation resources

Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs. 

Go to sample code

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.