Query your AWS database from your serverless application

January 2023: This post was reviewed and updated for accuracy.

To leverage high availability, scalability and cost optimization, often times, Developers and Database administrators want to access their databases from a Serverless Application. An application that can automatically scale, inherently highly available and run without provisioning or managing an EC2 host is known as a Serverless Application. This can eliminate the need of launching Amazon Elastic Compute Cloud (Amazon EC2) hosts in the virtual private cloud (VPC) to use as database clients.

In this blog, we will demonstrate how to query an AWS database from a Rest API URL, even if the database is not publicly accessible. You can even run it against an Amazon Aurora Serverless database and Neptune Serverless for a serverless-to-serverless query where RDBMS database will also be serverless and serverless application is hosted using Amazon API Gateway and AWS Lambda.

For ease of creating the serverless resources and better understanding for beginners , we use AWS CloudFormation stack in this blog post. However, you may also use AWS Serverless Application Model (AWS SAM) to create them. The CloudFormation stack will create the following two resources:

The AWS Lambda function that executes your query against your backend database
The API Gateway REST API that invokes the Lambda function when you access the URL

The Python code samples present at awslabs/rds-support-tools GitHub repository follow the best practices that can easily be adapted for other database engines. For Neptune, the SPARQL example can be adapted for Gremlin.

Also, in order to avoid any code compatibility issue for AWS Lambda and testing the In-VPC database connection, we are using a temporary Amazon EC2 client host through which we will execute the scripts and deploy the resources in AWS Cloud.

Additionally, at the end of this blog, we are also sharing substantial Troubleshooting tips and best practices specifically for querying databases in an Amazon VPC and resolving the issues you might come across while implementing this set up.

Before you begin

To see whether a serverless application is suitable for your use case, review the AWS Lambda limits and the API Gateway limits. In particular, note the runtime limits as these apply to your queries. If API Gateway invokes the Lambda function synchronously, the API Gateway runtime limit applies; if asynchronously, the Lambda runtime limit applies. If you choose an alternate invocation method, the Lambda runtime limit still applies. In this example, we invoke the Lambda function synchronously.

Prerequisites

You have launched an AWS database instance in a VPC, preferably using Neptune, RDS for MySQL, or RDS for PostgreSQL
For setup purposes, you have launched a temporary Amazon EC2 client host, preferably Amazon Linux, in the same VPC and AWS Region as the database.
You know the EC2 client host subnet ID and security group ID.
The client host security group ID (not the CIDR) is permitted as a source in the inbound rule of the database security group. The port range includes the database port.
Your preferred database client software has been installed on the client host, and you have established connectivity from this host to the database. Your database user has permissions to create database objects and to insert into and query from this database.
You have Python 3.7 installed on the client host. For help installing Python 3.x on an Amazon, see one of these AWS Knowledge Center articles: Instructions for Amazon 1 Linux or Instructions for Amazon 2 Linux.Note: The AWS Knowledge Center articles talks about how to setup up python virtual environment which is not required for this setup you can only install python3 that is enough for the scope of this article.
Your AWS user has permissions to create and manage IAM roles, Lambda, API Gateway, and AWS CloudFormation stacks, and to view Amazon CloudWatch Logs.
You have full console access. You’ve also configured the AWS CLI on your client host and can run the following command without a permission error:
```
aws s3 ls
```

Steps

The following sections walk you through setting up and testing the example.

Set up the solution
Load the sample data
Test the provided Python code from the command line on your client host
Zip your Python source code along with the Python packages
Create the Lambda function and API Gateway REST API
Query the database from the URL
Next steps
Review tips and troubleshooting
Summary and final steps

Step 1: Set up the solution

On your client host, create the temporary project directory to download the scripts and the AWS CloudFormation JSON template from the GitHub repository, you can find the scripts in the serverless folder of the rds-support-tools repository.
Note : If you have not used GitHub before, see these instructions for downloading awslabs/rds-support tools.
```
mkdir ~/temp-repo 
cd ~/temp-repo
git clone https://github.com/awslabs/rds-support-tools.git
cd serverless
```
After downloading the scripts, follow the below steps:
- Ensure that you are in the serverless folder of rds-support-tools directory and able to list the files shown here by ls command .
- Rename the Python script for your database engine to serverless-query.py .
- Keep the name of the AWS CloudFormation template: serverless-query-cfn.json .
Now, create the project directory for your serverless application and copy both the files here. Use the cp command to copy the files.
```
mkdir ~/svls 
cp serverless-query-cfn.json serverless-query.py ~/svls
cd ~/svls
```
Install dos2unix, and run all files through it.
```
sudo yum install -y dos2unix
dos2unix *
```
Verify that your Python 3.7 is installed.
```
python3 -V
```

From inside the folder , pip3 install the module for your database client that gets imported in your code sample use -t. flag to download the package in the same folder.

Database	Module to install using pip3
MySQL	`pip3 install pymysql -t.`
PostgreSQL	`pip3 install psycopg2-binary -t.`
Neptune/SPARQL	`pip3 install SparqlWrapper -t.`

Run the ls command to check that the package you downloaded is in the folder you have the python code.

Set the required permissions:

chmod a+r ~/svls
chmod 744 serverless-query.py
chmod 444 serverless-query-cfn.json

Run the ls command to check that the package you downloaded is in the folder you have the python code.

Step 2: Load the sample data

Use the provided insert script for your database engine to load the sample data into the database.

The insert script is present in the serverless folder in the rds-support-tools with file name as databaseengine-insert.sql.

Step 3: Test the provided Python code from the command line on your client host

Set environment variables for a command-line test:
MySQL and PostgreSQL:

export ENDPOINT='your-database-endpoint'
export PORT='your-database-port'
export DBUSER='your-database-user'
export DBPASSWORD='your-database-user-password'
export DATABASE='your-database-name'

Neptune:

export ENDPOINT='your-database-endpoint'
export PORT='your-database-port'
export DBUSER='None' 		
export DBPASSWORD='None'
export DATABASE='None'

Run the Python script from the command line:

python3 serverless-query.py

Neptune/SPARQL expected results:

['amzn://data/hello1 , amzn://data/world1', 'amzn://data/hello2 , amzn://data/world2']

MySQL and PostgreSQL expected results:

[('hello1', 'world1'), ('hello2', 'world2')]

You can comment out this print statement. If you leave it in, the query results print to CloudWatch Logs:
```
# print(results_list)
```

Step 4: Zip your Python source code along with the Python packages

Zip the script along with the database client Python package that you installed using pip3. For more information, see Creating a Deployment Package. Since you have installed the Python packages in the same directory, zip the directory contents (not the directory itself). To include all hidden files, use the option zip -r9.

zip -r9 ~/svls/serverless-query.zip *

Step 5: Create the Lambda function and API Gateway REST API

From your client host, make an Amazon S3 bucket if you don’t have one.
```
aws s3 mb s3://your-s3-bucket-name
```

Load the zip file to your S3 bucket:

aws s3 cp serverless-query.zip s3://your-s3-bucket-name

Create the AWS CloudFormation stack using the AWS CLI.

aws cloudformation create-stack \
--stack-name serverless-query-cfn \
--template-body file://serverless-query-cfn.json \
--region your-database-region \
--capabilities CAPABILITY_NAMED_IAM \
--parameters '[
{"ParameterKey":"PEndpoint","ParameterValue":"your-database-endpoint"},
{"ParameterKey":"PPort","ParameterValue":"your-database-port"},
{"ParameterKey":"PDatabase","ParameterValue":"your-database-name"},
{"ParameterKey":"PDbUser","ParameterValue":"your-db-user-name"},
{"ParameterKey":"PDbPassword","ParameterValue":"your-db-user-password"}, 
{"ParameterKey":"PS3Bucket","ParameterValue":"your-S3-bucket-name"},
{"ParameterKey":"PSubnetIds","ParameterValue":"your-EC2-client-SubnetId"},
{"ParameterKey":"PSecurityGroupIds","ParameterValue":"your-EC2-client-SecurityGroupId"}
 ]'

Neptune users must set these variables to “None”:

   {"ParameterKey":"PDatabase","ParameterValue":"None"},
   {"ParameterKey":"PDbUser","ParameterValue":"None"},
   {"ParameterKey":"PDbPassword","ParameterValue":"None"},

From the AWS CloudFormation console dashboard, check the serverless-query-cfn stack Events tab for progress. If the stack status is ROLLBACK_COMPLETE, find the reason for the failure under the Events tab. For more information, see Troubleshooting and Tips on making changes later in this post.
When the stack status is CREATE_COMPLETE, test the Lambda function from the command line:
```
rm -f invoke-lambda.out
aws lambda invoke \
--function-name ServerlessQuery \
--region your-database-region \
--log-type Tail invoke-lambda.out
echo **Return Object** 
cat invoke-lambda.out
```
To address any issues, see CloudWatch Logs, and the Troubleshooting and Tips on making changes sections later in this post.

To view CloudWatch Logs:
1. Open the CloudWatch console.
2. In the left navigation, choose Logs.
3. Choose /aws/lambda/ServerlessQuery.
4. Choose the log stream that has the latest Last Event Time.
Unless you commented out the print(results_list) statement from the Python script, you will also see the query result in the log stream.

Step 6. Query the database from the URL

You have two options for querying your database:

Option 1: Curl the URL from the command line:

Get the api-id:

aws apigateway get-rest-apis

Then:

curl https://api-id.execute-api.your-database-region.amazonaws.com/beta/query1

Option 2: Paste the URL into a browser:

Go to the API Gateway dashboard. Cut and paste the “invoke URL” for your beta stage into a browser, and append your resource name /query1 to the end of it.

To address any issues, see CloudWatch Logs, and the Troubleshooting and Tips on making changes sections.

View the query results from browser:

Step 7: Next steps

After setting up and testing your serverless application, here are some best practices and administration tasks you can perform to help optimize your solution.

Further protect your database password and other connection variables by encrypting Lambda environment variables with AWS KMS keys. You may also store your DB secrets in Parameter Store or AWS Secret Manager to leverage features of these purpose-built AWS security services.
Increase availability by putting Lambda function in more than one subnets inside VPC.
If you plan to call your API from an external site, such as a static S3 website, you need to enable CORS (cross-origin resource sharing) for your API Gateway resource. Note that the integration response is disabled for proxy integration, which we have used here. This means that, when configuring the response headers for CORS, you have to rely on the backend to return the CORS headers.
To ease management of your serverless AWS CloudFormation templates, try the AWS Serverless Application Model (AWS SAM).

Step 8: Review tips and troubleshooting

As you follow the steps in this blog post, use the following guidance to help address any issues you might encounter.

Tips on running longer queries

The example uses default timeout settings to run a short query. To run longer queries, you can do any of the following:

Increase timeout parameters for both API Gateway and Lambda up to their respective runtime limits. For synchronous invocation, you might need to set the Lambda timeout slightly lower than the API Gateway timeout to allow the REST API to process the request within API Gateway time limit.
Invoke Lambda asynchronously so that only the Lambda limit applies.
Push work to the backend. For example, use a materialized view or custom batch job to process the query locally and automatically, such that the summary result can be queried from a Lambda function within the runtime limit.
Make sure that your data is properly indexed and the query is tuned. Use partitioning and filtering to query the smallest possible dataset. Cache results if it’s appropriate for your application.
Limit query output by using the LIMIT clause or an equivalent.

Best Practices to follow while executing Python code and Lambda function

The Python code samples use best practices when working with Lambda, API Gateway, and specifically the two relational databases and the graph database in the examples: MySQL, PostgreSQL, and Neptune (SPARQL), respectively. These best practices are as follows:

A common best practice when using Lambda with DynamoDB is to open the database connection in the Lambda execution environment so that it can be reused in subsequent handler calls. Using a global connection variable inside the handler ensures that any new connections stay open for subsequent calls. However, in the case of MySQL and PostgreSQL, there is risk of session leakage across the open connection that can result in locking issues. A second issue is that too many open connections can cause MySQL and PostgreSQL to hit connection limits. Lastly, with Aurora Serverless, leaving sessions open prevents the database from sleeping. So even though our simple example is a read-only query, we open and close the connection in each handler call to avoid these potential issues. Neptune users: This guidance does not apply to the SPARQL example code because it does not make a persistent database connection.
The return object inside the handler is required by API Gateway. Use it in place of sys.exit() to exit gracefully. Otherwise you might see 502 errors reporting a “Malformed Lambda Proxy Response.”
If you encounter an unexpected amount of cold start and latency in Lambda, follow this blog to fine-tune the performance.
It is always recommended to connect your function to private subnets to access the resources inside VPC and internet at the same time. For more information, see this documentation. Connecting a function to a public subnet doesn’t grant it internet access or a public IP address.
Also, refer to the best practices for working with AWS Lambda functions.

Tips on making changes

It’s best to avoid deleting and re-creating individual AWS components that have interdependencies. The following tips should help you make any changes you need.

To change AWS CloudFormation stack parameters, update the stack. See the example script update-stack.cli in GitHub.
To change the Python code, re-zip and update the Lambda function. See the example script update-lambda.cli in GitHub.
You can also delete the entire AWS CloudFormation stack and start over. But note that the stack that’s used in this example contains a VPC configuration that creates an elastic network interface. The network interface takes time to clean up, so the delete can take a while.

Troubleshooting

Here are some common issues you might encounter, and a few suggested solutions for those issues.

Issue: Your Python code runs from your command line, but it hangs or times out when you invoke the Lambda function or API Gateway REST API.

Make sure that the database, client host, and AWS CloudFormation stack are all created in the same VPC and AWS Region.
Check the database security group. Make sure that the client host security group ID, not its CIDR, is permitted as a source in inbound rules.
Your Lambda function might be failing and retrying. See CloudWatch Logs for errors and CloudWatch metrics for resource issues.
If you are testing a query other than the simple example that was provided, see the Tips on running longer queries . Also try the simple query that was provided.

Issue: You’re getting the error “Unable to import module”:

Check CloudWatch Logs for additional information.
Verify that the source code file is named serverless-query.py, the zip file is serverless-query.zip, and the handler name in your AWS CloudFormation template is serverless-query.handler.
Verify that all permissions were set as instructed.
Unzip serverless-query.zip, and verify that serverless-query.py is in the root directory.
Rezip files exactly as instructed.
In the create-stack command, verify that the S3 bucket name is correct and of the form your-bucket-name (not prefixed with S3://). Verify that the latest version of serverless-query.zip is in the bucket. Check permissions on the bucket.
Check for typos in the create-stack command parameters.

Issue: The AWS CloudFormation stack is taking a long time to delete. The AWS CloudFormation Events log shows a status reason of “CloudFormation is waiting for Network Interfaces associated with the Lambda Function to be cleaned up”.

This is expected behavior. For more information, refer to this article.

Issue: Calling the REST API throws the error {"message":"Missing Authentication Token"}.

This error can appear if API Gateway URL contains the non-existing or a wrong resource. Consider adding /query1 at the end of Invoke URL of API Gateway. For more information, see this article.

Issue: Calling the REST API throws the error “SyntaxError: JSON.parse: unexpected character at line 1 column 2 of the JSON data.”.

Try from a browser other than Mozilla Firefox.

For all other error messages: See CloudWatch Logs for additional information.

Summary and final steps

Serverless applications provide benefits like high availability, flexibility, and ease of management. This post provides an example of how you can set up a serverless application and query an AWS database from a URL, even if the database is not publicly accessible. Through this blog post, we covered the steps for implementing the set up, testing, and troubleshooting the solution along with best practices.

Your last step is to delete the EC2 client host if it’s no longer needed. Because, after all, that was the whole point. :)

About the Authors

Deana Holmer is a senior database engineer in Premium Support team at Amazon Web Services.

Nishant Dhiman is a Solutions Architect at AWS with an extensive background in Serverless, Security and Mobile platform offerings. He is a voracious reader and a passionate technologist. He loves to interact with customers and always relishes giving talks or presenting on public forums. Outside of work, he likes to keep himself engaged with podcasts, calligraphy and music.

Asad Aazam is an Associate Solutions Architect at AWS with expertise in AWS Database technologies such as Aurora, RDS and DynamoDB, With good knowledge on relational databases he is helping homogeneous and heterogeneous database migrations and optimization in AWS cloud. When not working he likes to go on bike rides, travel and enjoy the beauty of nature.