Guidance for High-Speed RAG Chatbots on AWS

[SEO Subhead]

This Guidance demonstrates how to build a high-performance Retrieval-Augmented Generation (RAG) chatbot using Amazon Aurora PostgreSQL and the pgvector open-source extension, using AWS artificial intelligence (AI) services and open-source frameworks. Pgvector is configured as a vector database, allowing efficient vector search with the Hierarchical Navigable Small World (HNSW) indexing algorithm. The chatbot allows users to upload PDF files, ask questions in natural language, and receive answers based on the file content. With the scalability, availability, and cost-effectiveness of Aurora, you can operate your natural language processing chatbot globally.

Please note: [Disclaimer]

Architecture Diagram

[Architecture diagram description]

Download the architecture diagram PDF

Guidance Architecture Diagram for High-Speed RAG Chatbots on AWS

Step 1
Download the AWS CloudFormation template from the GitHub repository and deploy the CloudFormation stack.

Step 4
Once the Streamlit application starts, upload the PDF document for processing. This will segment the document into chunks and convert them into vectors using an Amazon Titan model from Amazon Bedrock. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs).

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Operational Excellence

Amazon Bedrock provides easy access to multiple FMs. In this Guidance, we use an Amazon Titan embeddings model through Amazon Bedrock to generate vector embeddings for the text chunks. For the conversational chatbot, we use the Anthropic Claude 3 LLM.

Also, a CloudFormation script is provided to create all the necessary prerequisites for this Guidance to be deployed with a single click.

Read the Operational Excellence whitepaper
Security

AWS Secrets Manager securely stores the database user credentials, preventing unauthorized access and password tampering issues. Secrets Manager also offers additional security features like automatic secret rotation and easy secret replication across AWS Regions, as well as auditing and monitoring of secret usage.

Additionally, the Aurora storage cluster is encrypted using an AWS Key Management Service (AWS KMS) key. Together, these services reduce the risk of security breaches.

Read the Security whitepaper
Reliability

Aurora, with the pgvector extension, provides vector storage and search capabilities, along with the resilient features of a relational database. The Aurora cluster stores six copies of data across three Availability Zones (AZs), providing high availability for the data. Aurora will automatically failover to a replica in another AZ, if an Availability Zone (AZ) or instance encounters a failure. Aurora also continuously backups the data to Amazon Simple Storage Service (Amazon S3).

Read the Reliability whitepaper
Performance Efficiency

Amazon Titan Text Embeddings v2 is optimized for high accuracy and well suited for semantic search use. When reducing from 1,024 to 512 dimensions, Titan Text Embeddings V2 retains approximately 99% retrieval accuracy. Vectors with 256 dimensions maintain 97 percent accuracy. This means that you can save 75 percent in vector storage (from 1024 down to 256 dimensions) and keep approximately 97 percent of the accuracy provided by larger vectors.

With the Amazon Titan text embeddings model available on Amazon Bedrock, it's easy to use and switch models based on the use case. Finally, the Amazon Titan text model helps to ensure the RAG process retires the most relevant information for the LLM, leading to more accurate answers.

Read the Performance Efficiency whitepaper
Cost Optimization

Amazon Bedrock provides a choice of two pricing plans: on-demand and provisioned throughput. The on-demand model allows you to use FMs on a pay-as-you-go basis, which is cost-efficient and gives you the agility to experiment with different models based on your needs.

Read the Cost Optimization whitepaper
Sustainability

In this Guidance, we use temporary resources, like an AWS Cloud9 instance for the integrated development environment (IDE), instead of dedicated Amazon Elastic Compute Cloud (Amazon EC2) instances to reduce cost. Also, we are using AWS Graviton Processor instance types for the Aurora database cluster, which uses 60 percent less energy than comparable Amazon EC2 instances with the same performance.

Read the Sustainability whitepaper

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Open sample code on GitHub

Related Content

Blog

Leverage pgvector and Amazon Aurora PostgreSQL for Natural Language Processing, Chatbots and Sentiment Analysis

This blog post demonstrates how to build an interactive chatbot app for question answering using LangChain and Streamlit and leveraged pgvector and its native integration with Aurora Machine Learning for sentiment analysis.

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.