AWS Public Sector Blog

Well-rounded technical architecture for a RAG implementation on AWS

AWS branded background design with text overlay that says "Well-rounded technical architecture for a RAG implementation on AWS"

In the age of generative artificial intelligence (AI), data isn’t just king—it’s the entire kingdom. Our previous blog post, Anduril unleashes the power of RAG with enterprise search chatbot Alfred on AWS, highlighted how Anduril Industries revolutionized enterprise search with Alfred, their innovative chat-based assistant powered by Retrieval-Augmented Generation (RAG) architecture.

In this post, we examine the technical intricacies that make this system possible. The success of any RAG implementation fundamentally depends on the quality, accessibility, and organization of its underlying data foundation. As we’ve seen from Anduril’s experience with Alfred, building a robust data infrastructure using AWS services such as Amazon Bedrock, Amazon SageMaker AI, Amazon Kendra, and Amazon DynamoDB in AWS GovCloud (US) creates the essential backbone for effective information retrieval and generation. This deep dive explores how organizations can architect their RAG implementations to harness the full potential of their data assets while maintaining security and compliance in highly regulated environments. Additionally, we discuss some of the responsible AI framework that customers should consider adopting as trust and responsible AI implementation remain crucial for successful AI adoption. But first, we explain technical architecture that makes Alfred such a powerful tool for Anduril’s workforce.

AWS GovCloud (US) foundation

At the core of Alfred’s architecture is AWS GovCloud (US), a specialized cloud environment designed to handle sensitive data and meet the strict compliance requirements of government agencies. This foundation provides Federal Risk and Authorization Management Program (FedRAMP) High and Department of Defense (DoD) Cloud Computing (CC) Security Requirements Guide (SRG) Impact Level 5 compliance, United States International Traffic in Arms Regulations (ITAR) compatibility, and physical separation from commercial AWS Regions. With US person-only access controls and enhanced security monitoring, this robust foundation allows Alfred to operate securely while maintaining the agility and scalability benefits of cloud computing.

The following diagram shows the architecture for Alfred’s RAG implementation.

Figure 1. Architectural diagram of Alfred’s RAG implementation. The major components are an Amazon Simple Storage Service (Amazon S3) bucket, Amazon Bedrock, Amazon Kendra, and Amazon DynamoDB.

RAG architecture

Alfred’s RAG architecture consists of two primary components that work in tandem to deliver accurate and contextual responses. The retrieval component uses Amazon Kendra as the intelligent search service, offering natural language processing (NLP) capabilities, machine learning (ML)–powered relevance ranking, and support for multiple data sources and formats. This is complemented by Amazon DynamoDB, which provides millisecond response times for data retrieval and automatic scaling to handle varying workloads.

The implementation includes components both with and without RAG, allowing Alfred to generate responses based on retrieved contextual information or fall back to base model knowledge when appropriate. This flexibility provides optimal responses across different types of queries and use cases.

Generation and model management

The generation component of Alfred’s architecture is built on Amazon Bedrock, which forms the backbone of its language processing capabilities. Amazon Bedrock hosts and manages the large language models (LLMs), currently using Claude 3.5 Sonnet v2 as the primary model with Llama 3.3 70B and Mixtral 8x7B on Amazon SageMaker AI as secondary models. The architecture is designed with future flexibility in mind, capable of accommodating additional models as they become available in AWS GovCloud (US).

Robust data processing pipeline

A sophisticated data processing pipeline maintains Alfred’s knowledge base and promotes high-quality responses. Amazon S3 serves as the central data lake, providing scalable object storage for raw documents with versioning and lifecycle management capabilities. Amazon Transcribe enables accurate transcription of audio and video content, and AWS Lambda functions handle document ingestion, preprocessing, and text extraction.

The pipeline’s efficiency is further enhanced by the indexing capabilities of Amazon Kendra, which provide automatic content classification, entity recognition, and semantic understanding. This comprehensive approach means that Alfred’s knowledge base remains current, accurate, and easily accessible.

Monitoring, security, and compliance

Comprehensive monitoring is provided through Amazon CloudWatch, offering real-time performance metrics, custom dashboards, and automated alerts. Security, a paramount concern, is implemented through multiple layers, including AWS Identity and Access Management (IAM) for role-based access control and AWS Key Management Service (AWS KMS) for centralized encryption key management.

Responsible AI

The AWS approach to responsible AI represents a comprehensive framework built on eight essential pillars designed to foster ethical and trustworthy AI development. At its core, the framework emphasizes safety and fairness through sophisticated tools such as Amazon SageMaker Clarify, which helps organizations detect and mitigate potential biases in ML models. A key component of this framework is Amazon Bedrock Guardrails, which provides additional customizable safeguards on top of built-in protections, blocking up to 85 percent more harmful content and filtering more than 75 percent of hallucinated responses for RAG and summarization workloads.

Privacy and security form another crucial component, implemented through robust encryption, access controls, and the AWS shared responsibility model, facilitating the protection of sensitive information. The framework also prioritizes transparency and explainability, helping users understand AI decision-making processes through features such as chain-of-thought reasoning and trace functionality. Additionally, AWS implements strong governance measures through tools such as SageMaker Role Manager and SageMaker Model Cards, while promoting veracity through RAG and human-in-the-loop capabilities. Finally, the AWS commitment to controllability makes sure that organizations maintain oversight of their AI systems through comprehensive monitoring and auditing tools, making it a leader in responsible AI implementation.

Best practices for implementation

When implementing a RAG solution similar to Alfred, organizations should begin with a clear use case, defining specific business objectives and target users. Focus should be placed on data quality through robust validation and consistent formatting. Regular security assessments and compliance monitoring are essential, as is continuous performance tracking and optimization. 

Conclusion

The successful implementation of Alfred demonstrates how RAG architecture, built on a secure foundation such as AWS GovCloud (US), can deliver powerful AI capabilities while maintaining the highest standards of security and compliance. As AI technology continues to evolve, organizations must balance innovation with responsible deployment, making sure their solutions meet both technical requirements and responsible AI requirements. By following the technical architecture outlined in this post and implementing robust responsible AI practices, organizations can create AI solutions that not only meet their immediate needs but also position them for future growth and adaptation in the rapidly evolving AI landscape.

Ready to explore RAG implementation in a secure government cloud environment? Try our AWS GovCloud (US) Jupyter Notebook environment today. This hands-on experience will help you understand the potential of RAG architecture while maintaining compliance with FedRAMP High security requirements.