AWS Database Blog

Integrate natural language processing and generative AI with relational databases

Organizations are increasingly grappling with the challenge of immediate access to data. The majority of this data is stored in relational database management system (RDBMS) applications requiring specialized SQL knowledge. This requirement creates a bottleneck, forcing companies to rely on developer resources. A transformative solution is to use the power of AI and harness the natural language processing (NLP) capabilities of Amazon Bedrock to allow users to interact with existing RDBMS databases. This innovative approach enables conversational interactions with databases, allowing end-users to access data using natural language rather than complex SQL queries. The implications of this technology are far-reaching:

  • Democratizes data access
  • Alleviates the need for SQL expertise among general users
  • Significantly reduces the burden on developer resources

In this post, we present an approach to using NLP to query an Amazon Aurora PostgreSQL-Compatible Edition database.

Solution overview

The solution presented in this post assumes that an organization has an Aurora PostgreSQL database. We create a web application framework using Flask for the user to interact with the database. JavaScript and Python code act as the interface between the web framework, Amazon Bedrock, and the database.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Luma, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities needed to build generative AI applications with security, privacy, and responsible AI. For this solution we chose Anthropic’s Claude 3 Sonnet model, based on its capabilities to convert NLP to code. The Sonnet model possesses high levels of intelligence, and it strikes the ideal balance between intelligence and speed—qualities especially critical for a real time interface.

The following diagram illustrates the solution architecture:

Architecture diagram

The solution involves the following steps:

  1. A Flask web application framework allows the user to enter a natural language query to the database in plain text.
  2. JavaScript handles the user interaction through a Generate Report
  3. JavaScript captures the report generation action and passes the user’s input to Python.
  4. Python code parses the user’s input as a check against possible SQL injection attempts.
  5. Amazon Bedrock processes the user’s natural language input and converts it into SQL using Anthropic’s Claude 3 Sonnet model.
  6. The SQL generated by Amazon Bedrock is parsed to help prevent restricted actions.
  7. AWS Secrets Manager is accessed to obtain database connection information and credentials.
  8. The SQL is executed against the Aurora PostgreSQL database.
  9. Results from the database are presented to the user on the webpage.

Prerequisites

This solution requires the following AWS services and resources:

Create the tables

Complete the following steps to create the tables:

  1. Connect to your bastion host.
  2. Connect to the database.
  3. Create the customers table with the following DDL:
    CREATE TABLE dc_ai_test.customers 
    (
    	customer_id serial4 NOT NULL,
    	first_name varchar(50) NOT NULL,
    	last_name varchar(50) NOT NULL,
    	house_number varchar(50) NOT NULL,
    	street_name varchar(50) NOT NULL,
    	city varchar(50) NOT NULL,
    	zipcode int4 NOT NULL,
    	email varchar(100) NOT NULL,
    	created_date timestamptz DEFAULT CURRENT_TIMESTAMP NULL,
    	CONSTRAINT customers_pkey PRIMARY KEY (customer_id)
    );
    
    SQL
  4. Create the items table with the following DDL:
    CREATE TABLE dc_ai_test.items 
    (
    	item_id serial4 NOT NULL,
    	item_name varchar(50) NOT NULL,
    	item_price numeric(5, 2) NOT NULL,
    	CONSTRAINT items_pkey PRIMARY KEY (item_id)
    );
    
    SQL
  5. Create the orders table with the following DDL:
    CREATE TABLE dc_ai_test.orders 
    (
    	customer_id int4 NOT NULL,
    	item_id int4 NOT NULL,
    	quantity int4 NOT NULL,
    	order_date timestamptz DEFAULT CURRENT_TIMESTAMP NULL
    );
    
    SQL
  6. Populate the customers table with the following SQL:
    INSERT 
    INTO 	dc_ai_test.customers 
    		(
    		first_name, last_name, house_number, street_name, city, zipcode, email
    		)
    WITH names AS 
    	(
    		SELECT * FROM (
    		VALUES
    		('James', 'Smith'), ('John', 'Johnson'), ('Robert', 'Williams'), ('Michael', 'Brown'),
    		('William', 'Jones'), ('David', 'Garcia'), ('Richard', 'Miller'), ('Joseph', 'Davis'),
    		('Thomas', 'Rodriguez'), ('Charles', 'Martinez'), ('Mary', 'Anderson'), ('Patricia', 'Taylor'),
    		('Jennifer', 'Thomas'), ('Linda', 'Moore'), ('Elizabeth', 'Jackson'), ('Barbara', 'Martin'),
    		('Susan', 'Lee'), ('Jessica', 'Thompson'), ('Sarah', 'White'), ('Karen', 'Lopez'),
    		('Emma', 'Clark'), ('Oliver', 'Lewis'), ('Noah', 'Walker'), ('Liam', 'Hall'),
    		('Sophia', 'Young'), ('Isabella', 'King'), ('Ava', 'Wright'), ('Mia', 'Scott'),
    		('Charlotte', 'Green'), ('Amelia', 'Baker'), ('Lucas', 'Adams'), ('Mason', 'Nelson'),
    		('Ethan', 'Hill'), ('Alexander', 'Rivera'), ('Henry', 'Campbell')
    		) AS n(first_name, last_name)
    	),
    streets AS 
    	(
    		SELECT * FROM (
    		VALUES
    		('Maple'), ('Oak'), ('Pine'), ('Cedar'), ('Elm'),
    		('Washington'), ('Main'), ('Park'), ('Lake'), ('Hill'),
    		('River'), ('Spring'), ('Market'), ('Church'), ('South')
    		) AS s(street)
    	),
    cities AS 
    	(
    		SELECT * FROM (
    		VALUES
    		('New York', 10001), ('Los Angeles', 90001), ('Chicago', 60601),
    		('Houston', 77001), ('Phoenix', 85001), ('Philadelphia', 19101),
    		('San Antonio', 78201), ('San Diego', 92101), ('Dallas', 75201),
    		('San Jose', 95101)
    		) AS c(city, zip)
    	)
    SELECT
    		first_name,
    		last_name,
    		CAST(FLOOR(random() * 9999 + 1) AS VARCHAR) as house_number,
    		street || ' ' ||
    		CASE (random() * 4)::INT
    			WHEN 0 THEN 'Street'
    			WHEN 1 THEN 'Avenue'
    			WHEN 2 THEN 'Road'
    			ELSE 'Drive'
    		END as street_name,
    		city,
    		zipcode,
    		LOWER(first_name) || '.' || LOWER(last_name) ||
    		CAST(FLOOR(random() * 999 + 1) AS VARCHAR) || '@email.com' as email
    		FROM (
    					SELECT
    					n.first_name,
    					n.last_name,
    					s.street,
    					c.city,
    					c.zip + FLOOR(random() * 99)::INT as zipcode
    					FROM names n
    					CROSS JOIN streets s
    					CROSS JOIN cities c
    					ORDER BY random()
    					LIMIT 100
    			) t
    ;
    
    SQL
  7. Populate the items table with the following SQL:
    INSERT 
    INTO 	dc_ai_test.items (item_name, item_price)
    VALUES
    		('Coffee Mug', 12.99),
    		('Wireless Mouse', 24.99),
    		('Notebook', 4.99),
    		('USB Cable', 9.99),
    		('Phone Charger', 15.99),
    		('Desk Lamp', 29.99),
    		('Water Bottle', 18.99),
    		('Keyboard Wrist Rest', 14.99),
    		('Sticky Notes', 3.99),
    		('Pen Set', 7.99),
    		('Mouse Pad', 8.99),
    		('Headphone Stand', 19.99),
    		('Cable Organizer', 11.99),
    		('Screen Cleaner', 6.99),
    		('Desk Calendar', 13.99),
    		('Laptop Stand', 27.99),
    		('Desk Plant', 16.99),
    		('Book Stand', 21.99),
    		('Calculator', 17.99),
    		('Desk Organizer', 23.99)
    ;
    
    SQL
  8. Populate the orders table with the following SQL:
    WITH date_series AS 
    (
    	SELECT generate_series
    	(
    		CURRENT_TIMESTAMP - INTERVAL '100 days',
    		CURRENT_TIMESTAMP,
    		'1 hour'
    	)::timestamp with time zone AS order_time
    ),
    customer_orders AS 
    (
    	SELECT
    		c.customer_id,
    		i.item_id,
    		ds.order_time,
    		-- More orders during business hours (9am-5pm)
    		CASE
    			WHEN EXTRACT(HOUR FROM ds.order_time) BETWEEN 9 AND 17
    			THEN FLOOR(RANDOM() * 3 + 1)::integer
    			ELSE FLOOR(RANDOM() * 2 + 1)::integer
    		END AS quantity
    		FROM date_series ds
    		CROSS JOIN dc_ai_test.customers c
    		CROSS JOIN dc_ai_test.items i
    		WHERE
    		-- Reduce total number of orders
    		RANDOM() < 0.01
    )
    INSERT 
    INTO 	dc_ai_test.orders 
    		(
    			customer_id, item_id, quantity, order_date
    		)
    SELECT
    		customer_id,
    		item_id,
    		quantity,
    		order_time
    FROM 	customer_orders
    ORDER BY
    		order_time
    ;
    
    SQL
  9. Store the database information in AWS Secrets Manager

    Store the database connection information and credentials in AWS Secrets Manager as shown in the following screenshots. The sample code provided is expecting these specific values.

    Secret details

    Secret value

    Request access to the foundation model

    New users must request access to Anthropic’s Claude Sonnet model before using the model for the first time. Complete the following steps:

    1. On the Amazon Bedrock console, choose Model access in the navigation pane.
    2. Select Anthropic – Claude Sonnet.
    3. Choose Request model access.
    4. In the pop-up window, review the terms and conditions for using the model, and select the check box to agree to the terms.
    5. Choose Submit request to finalize your access request.

    The approval process typically takes 1–2 business days.

    Launch and configure the web server

    The web server runs on an EC2 instance running Amazon Linux 2023. The following software must be installed:

    Configure the database security group to allow access from the web server

    Edit the security group of the database to allow connections on port 5432 from the web server:

    1. On the Amazon RDS console, locate the Aurora PostgreSQL cluster you created earlier.
    2. Choose the cluster name to open the cluster details page.
    3. In the Connectivity & security section, find the security group associated with your database cluster.
    4. Choose the security group name to open the security group details page.
    5. On the security group details page, choose the Inbound rules
    6. Choose Edit inbound rules.
    7. Choose Add rule.
    8. Provide the following information:
      1. For Type, choose PostgreSQL.
      2. For Port range, enter 5432.
      3. For Source, enter the IP address or the security group of your web server. If you’re using the web server’s IP address, enter it in CIDR notation (for example, 168.1.0/24).
      4. If you’re using the web server’s security group, choose Security group on the dropdown menu and choose the appropriate group.
    9. Choose Save changes to apply the new inbound rule.
    10. Verify that the new inbound rule has been added to the security group.

    Connections to the Aurora PostgreSQL database on port 5432 will now be allowed from the specified web server source, enabling the web application to communicate with the database.

    Remember to follow AWS security best practices, such as regularly reviewing and updating your security group rules, to maintain the ongoing security of your database and web application.

    Create an IAM role to allow the web server to access Amazon Bedrock

    The web server requires AWS Identity and Access Management (IAM) permissions to access Amazon Bedrock. Complete the following steps:

    1. On the IAM console, locate the IAM role associated with your web server.
    2. Choose the role name to open the role details page.
    3. On the Permissions tab, choose Add permissions.
    4. On the Add permissions dropdown menu, choose Create inline policy.
    5. On the Create inline policy page, choose the JSON
    6. Enter the following policy document in the JSON editor:
      {
        "Version": "2012-10-17",
        "Statement": [
        {
      	"Effect": "Allow",
      	"Action": [
      	"bedrock:InvokeModel",
      	"bedrock:InvokeModelWithResponseStream"
      	],
      	"Resource": [
      		"arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"
      	]
        }
       ]
      }
      
      JSON
    7. Choose Next.
    8. On the Review policy page, give the policy a name (for example, Web_AI_Access) and a description (optional).
    9. Choose Create policy to save the inline policy.

    The policy should now be attached to the web server’s IAM role.

    Remember to follow AWS security best practices, such as regularly reviewing and updating your IAM policies, to maintain the ongoing security of your application and data.

    Configure the web server’s security group to allow access from the bastion host

    The bastion host running the web browser needs to be able to connect to the web server. Complete the following steps:

    1. On the Amazon EC2 console, choose Security groups in the navigation pane under Network & security.
    2. Locate the security group associated with your web server. You can find this by looking at the Description column or by checking the tags.
    3. Choose the web server’s security group.
    4. On the Inbound rules tab, choose Edit inbound rules.
    5. On the Edit inbound rules page, choose Add rule.
    6. Configure the new rule as follows:
      1. For Type, choose Custom TCP.
      2. For Port range, enter 5000.
      3. For Source, choose Custom and enter the security group ID of your bastion host. You can find this by looking up the bastion host’s security group in the same security groups list. The format will be similar to sg-xxxxxxxxxxxxxxxxx.
      4. Add an optional description for the rule, such as Allow Flask access from bastion host.
    7. Choose Save rules to apply the changes.
    8. Verify that the new inbound rule has been added to the security group.

    This configuration allows the bastion host to access the Flask application running on port 5000 of your web server.

    Note the following considerations:

    • Make sure that your Flask application is configured to run on port 5000. If it’s set to a different port, adjust the rule accordingly.
    • This setup assumes that you’re using a bastion host for secure access to your web server. If you’re not using a bastion host, you might need to adjust the source to your specific needs (such as your IP address or a different security group).
    • Always follow the principle of least privilege. Only open the ports that are necessary for your application to function.
    • Regularly review and audit your security group rules to make sure they align with your security requirements.

    Execute the sample code

    The sample code to display the webpage and interface with Anthropic’s Claude 3 Sonnet model is available in the following GitHub repo. Execute the sample code on the web server. The steps to accomplish this are as follows:

    1. Connect to the web server.
    2. Create a directory (folder) for the code.
    3. Enter the sample code into a file named py.
    4. Start the web server with the command Python3 ./AuroraAI.py.
      The output should be similar to the following screenshot.

      Screen shot of message indicating web server is running

    5. Note the second IP displayed to use in a later step.

    Test the solution

    Access the webpage as follows:

    1. Sign on to the Windows bastion server.
    2. Launch a web browser.
    3. Navigate to the second IP collected from the previous step.

    The webpage should be similar to the following screenshot.

    Screen shot of web page displaying initial page

    You can test the solution with a sample prompt such as:

    List all the customers in Chicago that have placed orders in the last month. The report should include the customers first name, last name, item name, item price and order date sorted by customer last name, first name, order date

    This would be the equivalent of coding the following SQL:

    SELECT   
    	c.first_name,
    	c.last_name,
    	i.item_name,
    	i.item_price,
    	o.order_date
    FROM 	dc_ai_test.customers c
    JOIN    dc_ai_test.orders o
    ON      c.customer_id = o.customer_id
    JOIN    dc_ai_test.items i
    ON      o.item_id = i.item_id
    WHERE   c.city = 'Chicago'
    AND     o.order_date >= CURRENT_DATE - interval '1 month'
    ORDER BY 
    	c.last_name,
    	c.first_name,
    	o.order_date
    ;
    SQL

    The following webpage should be displayed.

    Screen shot of web page displaying the results of the SQL

    Another test to demonstrate grouping could be with a prompt such as:

    Generate a report of the sales grouped by year, month and city. Only show the top 10.

    This would be the equivalent of coding the following SQL:

    SELECT     
    	DATE_PART('year', o.order_date) AS year,
    	DATE_PART('month', o.order_date) AS month,
    	c.city,
    	SUM(i.item_price * o.quantity) AS total_sales
    FROM    dc_ai_test.orders o
    JOIN    dc_ai_test.customers c
    ON      o.customer_id = c.customer_id
    JOIN  	dc_ai_test.items i
    ON      o.item_id = i.item_id
    GROUP BY   
    	year,
    	month,
    	c.city
    ORDER BY   
    	total_sales DESC
    LIMIT   10
    ;
    
    SQL

    The following webpage should be displayed:

    Screen shot of web page displaying the results of the SQL

    The exact data will vary depending on when the sample data was created in your database.

    If a user attempts to change the data with a prompt such as:

    Update the items table to set the price of “Coffee Mug” to $99

    The following webpage should be displayed:

    Screen shot of the Report Generator web page

    Cleaning up

    To avoid incurring unnecessary charges and to clean up the resources used in this solution, follow these steps:

    1. Connect to the bastion host
    2. Connect to the database
      DROP TABLE dc_ai_test.customers;
      DROP TABLE dc_ai_test.items;
      DROP TABLE dc_ai_test.orders;
      
      SQL
    3. If an Aurora database was created, it can be deleted by following the instructions at Deleting Aurora DB clusters and DB instances
    4. Delete the Secrets Manager secret:
      1. On the Secrets Manager console, choose Secrets in the navigation pane.
      2. Select the secret you created for storing the database information, and on the Actions menu, choose Delete secret.
    5. Remove access to the Amazon Bedrock foundational models as described at Add or remove access to Amazon Bedrock foundation models)
    6. Terminate the web server EC2.
    7. Remove the entry in the security group of the database created to allow connections on port 5432 from the web server
    8. Delete the IAM role:
      1. On the IAM console, choose Roles in the navigation pane.
      2. Select the role created for the Lambda function and choose Delete.
      3. Confirm the deletion when prompted.

    Conclusion

    The integration of AI-driven NLP with RDBMS applications represents a significant leap forward in data accessibility and management. By taking advantage of the generative AI and NLP capabilities of Amazon Bedrock, we have demonstrated a powerful solution that bridges the gap between complex database structures and user-friendly interactions.

    The ability to interact with databases through natural language queries opens up new possibilities for data exploration, decision-making, and operational efficiency. As we move forward, it’s clear that the fusion of AI, NLP, and traditional database systems will continue to reshape how organizations interact with and derive value from their data, paving the way for more inclusive and agile data-driven cultures.

    If you have any questions or comments, post your thoughts in the comments section.


    About the Authors

    Dean Capps is a Database Consultant at AWS and has experience providing database support on legacy, open systems, and cloud platforms. Dean is currently a member of the AWS Worldwide Public Sector team, providing homogeneous and heterogeneous database migration support to customers in the US Federal space.

    Wajid Ali Mir is a Database Consultant at AWS and works as a database migration specialist, enabling customers to build highly available, cost-effective database solutions and migrate their database workloads to the AWS Cloud. Wajid works with AWS customers and partners to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.

    Feng Cai is a Senior Database Consultant at AWS. He has a wide background in providing database services to different clients. Feng is currently a member of the AWS Worldwide Public Sector team, providing homogeneous and heterogeneous database migration support to customers.