This Guidance demonstrates how to build an application that allows users to query relational databases using natural language. The architecture uses a LangChain SQL database chain, a Streamlit frontend framework, and an Amazon SageMaker JumpStart foundation model (FM). The user's natural language query is passed to the LangChain SQL database chain, which translates it into a SQL statement using the FM. This SQL statement runs against the configured relational database, retrieving the relevant results.

These database results are passed back to the FM, which generates a natural language explanation summarizing the findings in an easy-to-understand format. The natural language explanation is then presented to the user through the Streamlit frontend, providing a user-friendly interface for submitting queries and viewing results.

Please note: [Disclaimer]

Architecture Diagram

[text]

Download the architecture diagram PDF 

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • The Docker images in the Amazon ECR repository and Amazon ECS task definitions use semantic versioning. Amazon ECS will automatically deploy services to Fargate based on the task definition. Both the service and underlying task can be updated and redeployed. Amazon RDS for PostgreSQL has native functionality that allows the user to perform minor and major updates to the database engine. CloudWatch captures logs and metric (telemetry) emitted from the services in the architecture, including Fargate, Amazon ECS, and Amazon RDS for PostgreSQL

    Read the Operational Excellence whitepaper 
  • Amazon ECR stores the Docker image containing the NLQ application. Amazon ECR image scanning helps prevent vulnerabilities. The Amazon RDS instance and the Amazon ECS cluster are both restricted to a VPC. Using AWS managed services, you can prevent direct access to compute resources through remote desktop protocol (RFP) and secure socket shell (SSH). 

    AWS security best practices dictate that HTTPS should be used, as opposed to HTTP, to secure data in transit. HTTPS will require SSL/TLS server certificate. We recommend using ACM to manage the certificate, which is loaded onto the ALB. The ALB uses the certificate to terminate the front-end HTTPS connection and then decrypt requests from the end user. The certificate will require a registered DNS hostname, which can be managed using Amazon Route 53.

    Read the Security whitepaper 
  • VPC Classless Inter-Domain Routing (CIDR) block ranges are non-overlapping. We properly sized subnets CIDR block to accommodate expansion of resources, such as Amazon RDS and Amazon ECS, and you will deploy resources to two Availability Zones for high availability. While VPC and associated subnets CIDR block are sized to accommodate reasonable expansion of resources, they are not infinitely scalable. The resources are restricted to a single AWS account and Region, and as such, reliability may be impacted by outages within the Region or issues impacting account accessibility.

    Read the Reliability whitepaper 
  • We chose Amazon RDS for PostgreSQL because PostgreSQL is a widely used open-source database engine that has large community support. We configured based on the anticipated volume of requests and small size of the dataset used for the demonstration. We chose Amazon ECS on Fargate because Fargate is serverless and can scale to meet increased demand. 

    Read the Performance Efficiency whitepaper 
  • You should size resources to meet your organizational requirements and priorities. In particular, properly sizing the Amazon RDS for PostgreSQL instance, the SageMaker JumpStart Foundation Model inference hosting instance, and the Amazon ECS vCPU and memory resources can provide the largest impact on cost.

    The Amazon RDS for PostgreSQL instance can use RDS storage auto scaling to continuously monitor actual storage consumption and automatically scale capacity based on actual utilization. You can implement optional software and hardware caching to minimize scaling requirements on Amazon RDS and the FM hosting instance. The SageMaker JumpStart Foundation Model inference endpoint can be enabled for auto-scaling if required. Amazon ECS on Fargate is serverless and can be configured to meet demand through scaling.

    Read the Cost Optimization whitepaper 
  • This architecture maximizes the available resources without wasting additional resources that would not be necessary for the core requirements of the Guidance. For example, we chose Amazon RDS for Postgre SQL over Amazon Aurora, which would have powerful but unnecessary functionality for this architecture’s purpose. We also provisioned a single SageMaker JumpStart Foundation Model endpoint instance rather than multiple instances. 

    Read the Sustainability whitepaper 

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

AWS Architecture
Blog

Title

Subtitle
Text.
 
This post demonstrates how...

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?