This Guidance demonstrates how to combine Retrieval Augmented Generation (RAG) with AWS services to build generative AI applications. Large language models (LLMs), a type of generative AI, are typically trained offline, making the models become less relevant as more data is created after the model was trained. With this Guidance, you can use RAG to retrieve data from multiple data sources, including data from outside the LLM. The data can be added to the LLM model to generate more accurate, human-like responses across text and voice interfaces. 

Please note: [Disclaimer]

Architecture Diagram


Download the architecture diagram PDF 

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • Amazon Kendra, Amazon Lex, Lambda, DynamoDB, and Amazon SageMaker are used throughout this Guidance to enhance your operational excellence. Amazon Kendra enhances the efficiency and precision of accessing enterprise content, delivering accurate results swiftly. Paired with Amazon Lex, it offers a fluid conversational interface, making user inquiries and intent processing more streamlined. Lambda offers agile and scalable user interaction management, and the integration of the LangChain orchestrator within Lambda promotes efficient coordination among services, simplifying operations. Also, the capability of DynamoDB to store conversations reliably means no repeated interactions. Finally, the LLM of SageMaker contributes to context-relevant replies, bolstering consistent operations and an improved user experience.

    These services have been integrated to ensure your operations remain streamlined, efficient, and capable of consistently delivering desired outcomes. From the precise content retrieval using Amazon Kendra, to the structured orchestration managed by LangChain, every component aims to reduce operational overhead, minimize errors, and promote a system that's continuously improving in efficiency and responsiveness.

    Read the Operational Excellence whitepaper 
  • Amazon Kendra, Amazon Lex, Lambda, CloudFront, DynamoDB, and SageMaker offer a comprehensive and integrated approach to secure user interactions and data management. Their inherent security mechanisms, from the controlled access in Lambda to the encrypted storage in DynamoDB and the secure data indexing in Amazon Kendra, support a secure, responsive, and efficient user experience. For example, Amazon Kendra safely collects and organizes content from the internet, guaranteeing the accuracy of the data. Additionally, users have the option to use the Amazon Lex web UI, a protected environment, to communicate with the chatbot. When invoked by Amazon Lex, Lambda serves as a safeguarded bridge for user requests, processing queries from the Amazon Lex UI, delivered by CloudFront, and sending back answers. Lambda also establishes a secure link with Amazon Kendra to extract pertinent details and collaborates with the LLM in SageMaker to generate responses. Moreover, all conversations are securely preserved in DynamoDB, ensuring lasting data protection. Finally, within Lambda, the LangChain orchestrator ensures a safe coordination among Amazon Lex, Amazon Kendra, and the LLM of SageMaker.

    Read the Security whitepaper 
  • AWS CloudFormation, Amazon Kendra, Amazon Lex, Lambda, CloudFront, DynamoDB, and SageMaker work collectively to enhance the reliability of your workloads. Specifically, using CloudFormation can help you set up the system using best practices, ensuring reliable resource management, while Amazon Kendra offers stable data retrieval through dependable indexing. With Amazon Lex, you get accurately interpreted user intent, coupled with Lambda that helps to ensure scalable and continuous responses. Also, DynamoDB securely stores conversations and the LLM of SageMaker ensures uninterrupted responses to user queries.

    These services have been selected to ensure a system that is resilient, scalable, and can recover from failures efficiently. From infrastructure as code practices with CloudFormation that allow for quick recovery, to the seamless orchestration between services ensuring consistent performance, each component has been integrated to uphold the highest standards of reliability set by AWS.

    Read the Reliability whitepaper 
  • The services selected for this Guidance help improve your performance in a number of ways. First, Amazon Kendra improves the RAG process to get relevant content. Next, Amazon Lex provides an easy way for users to ask questions and understand their intent. Also, Lambda offers quick responses, while DynamoDB stores and fetches conversations efficiently, reducing delays. Finally, using the right LLM in SageMaker, users get quicker answers, leading to a better experience.

    Read the Performance Efficiency whitepaper 
  • Amazon Kendra, SageMaker, Amazon Lex, CloudFront, and DynamoDB offer a number of cost-saving features. One, Amazon Kendra and SageMaker adjust resources based on actual need, ensuring you only pay for what you use. Also, the pricing for SageMaker is flexible and automatically scales. Two, Amazon Lex lets you create chat interfaces without the hassle of managing an infrastructure. Three, DynamoDB offers flexible payment options for better cost management. These services were selected because they offer efficient mechanisms for resource management and billing. Their dynamic scaling, pay-as-you-go models, and other tailored features, like caching in CloudFront and flexible payment options in DynamoDB, align seamlessly with the goal of cost optimization.

    Read the Cost Optimization whitepaper 
  • Each service deployed in this Guidance was selected due to its serverless and managed nature, eliminating the need for physical hardware. By using serverless services like Lambda and Amazon Lex, you can automatically adjust to demand without using physical servers, reducing energy use. Services like Amazon Kendra and SageMaker operate efficiently without manual setup or maintenance. DynamoDB saves conversations and prevents repeat computations, saving resources. Altogether, this approach is eco-friendly to help you align with your current sustainability objectives. 

    Read the Sustainability whitepaper 

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

AWS Machine Learning

Quickly build high-accuracy Generative AI applications on enterprise data using Amazon Kendra, LangChain, and large language models

This post demonstrates how to implement a Retrieval Augmented Generation (RAG) workflow by combining the capabilities of Amazon Kendra with LLMs to create state-of-the-art GenAI applications providing conversational experiences over your enterprise content. 


The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?