AWS Database Blog

Query your AWS database from your serverless application

AWS database customers want to save money, scale seamlessly, and enjoy high availability by not having to launch Amazon Elastic Compute Cloud (Amazon EC2) hosts in their virtual private cloud (VPC) to use as database clients. An application that you run without provisioning or managing an EC2 host is known as a serverless application.

In this post, I show you how to query an AWS database from a URL, even if the database is not publicly accessible. You can even run it against an Amazon Aurora Serverless database for a serverless-to-serverless query! I provide a selection of Python scripts that work for Amazon Neptune, Amazon Relational Database Service (Amazon RDS) for MySQL, or Amazon RDS for PostgreSQL, respectively. I also walk you through setting up and creating the following two components:

  • The AWS Lambda function that executes your query against your backend database
  • The API Gateway REST API that invokes the Lambda function when you access the URL

I provide an AWS CloudFormation template that makes creating these components a snap. And I include substantial Tips and Troubleshooting sections specifically for querying databases in an Amazon VPC that are not strictly key-value stores.

The Python code samples apply best practices that can easily be adapted for other database engines. For Neptune, the SPARQL example can be adapted for Gremlin.

You can download the Python code and the AWS CloudFormation template from the awslabs/rds-support-tools GitHub repository.

This post is intended for AWS database users who have no experience with AWS Lambda, Amazon API Gateway, or AWS CloudFormation, and limited experience with Python. The time to complete the steps is about 30 minutes.

Before you begin

To see whether a serverless application is right for you, review the AWS Lambda limits and the API Gateway limits. In particular, note the runtime limits as these apply to your queries. If API Gateway invokes the Lambda function synchronously, the API Gateway runtime limit applies; if asynchronously, the Lambda runtime limit applies. If you choose an alternate invocation method, the Lambda runtime limit still applies. In this example, we invoke the Lambda function synchronously.

Prerequisites

  • You have launched an AWS database instance in a VPC, preferably using Neptune, RDS for MySQL, or RDS for PostgreSQL
  • For setup purposes, you have launched a temporary Amazon EC2 client host, preferably Amazon Linux, in the same VPC and AWS Region as the database.
  • You know the EC2 client host subnet ID and security group ID.
  • The client host security group ID (not the CIDR) is permitted as a source in the inbound rule of the database security group. The port range includes the database port.
  • Your preferred database client software has been installed on the client host, and you have established connectivity from this host to the database. Your database user has permissions to create database objects and to insert into and query from this database.
  • You have Python 3.6 installed on the client host. For help installing Python 3.x on an Amazon Linux host using virtualenv, see the five-minute AWS Knowledge Center video How do I create an isolated Python 3.4 environment—but install Python 3.6 instead of Python 3.4.
  • Your AWS user has permissions to create and manage IAM roles, Lambda, API Gateway, and AWS CloudFormation stacks, and to view Amazon CloudWatch Logs.
  • You have full console access. You’ve also configured the AWS CLI on your client host and can run the following command without a permission error:
    aws s3 ls
  • If you have not used GitHub before, see these instructions for downloading awslabs/rds-support tools.

Steps

The following sections walk you through setting up and testing the example.

  1. Set up the solution
  2. Load the sample data
  3. Test the provided Python code from the command line on your client host
  4. Zip your Python source code along with the Python packages
  5. Create the Lambda function and API Gateway REST API
  6. Query the database from the URL
  7. Next steps
  8. Review tips and troubleshooting
  9. Summary and final steps

Step 1: Set up the solution

  1. On your client host, create the project directory for your database; for example:
    mkdir ~/svls 
    cd ~/svls 
  2. Download the scripts and the AWS CloudFormation JSON template from the GitHub repository to your client host.
  3. Rename the Python script for your database engine to serverless-query.py. Keep the name of the AWS CloudFormation template: serverless-query-cfn.json.
  4. Install dos2unix, and run all files through it.
    sudo yum install -y dos2unix
    dos2unix *
  5. Verify that your Python 3.6 environment is activated.
    which python
    ~/venv/python36/bin/python
  6. From inside your Python 3.6 virtualenv environment, pip3 install the module for your database client that gets imported in your code sample.
    Database Module to install using pip3
    MySQL pip3 install pymysql
    PostgreSQL pip3 install psycopg2-binary
    Neptune/SPARQL pip3 install SparqlWrapper
  7. Set the required permissions:
    chmod a+r ~/svls
    chmod 744 serverless-query.py
    chmod 444 serverless-query-cfn.json
  8. Locate the packages you installed using pip3, and make sure that they have global read permissions. If you are using virtualenv, they should be in one of the following directories:
    ls $VIRTUAL_ENV/lib64/python3.6/site-packages
    sudo chmod a+r -R $VIRTUAL_ENV/lib64/python3.6/site-packages
    
    #or
    
    ls $VIRTUAL_ENV/lib/python3.6/site-packages
    sudo chmod a+r -R $VIRTUAL_ENV/lib/python3.6/site-packages

Step 2: Load the sample data

Use the provided insert script for your database engine to load the sample data into the database.

Step 3: Test the provided Python code from the command line on your client host

  1. Set environment variables for a command-line test:
    MySQL and PostgreSQL:

    export ENDPOINT='your-database-endpoint'
    export PORT='your-database-port'
    export DBUSER='your-database-user'
    export DBPASSWORD='your-database-user-password'
    export DATABASE='your-database-name'
    

    Neptune:

    export ENDPOINT='your-database-endpoint'
    export PORT='your-database-port'
    export DBUSER='None' 		
    export DBPASSWORD='None'
    export DATABASE='None' 
  2. Run the Python script from the command line:
    ./serverless-query.py

    Neptune/SPARQL expected results:

    ['amzn://data/hello1 , amzn://data/world1', 'amzn://data/hello2 , amzn://data/world2']

    MySQL and PostgreSQL expected results:

    [('hello1', 'world1'), ('hello2', 'world2')]
  3. You can comment out this print statement. If you leave it in, the query results print to CloudWatch Logs:
    # print(results_list)

Step 4: Zip your Python source code along with the Python packages

  1. Zip the script along with the database client Python package that you installed using pip3. For more information, see Creating a Deployment Package.Go to the packages directory where your Python packages were installed, and zip the directory contents (not the directory itself). Place the zip file in your project folder. To include all hidden files, use the option zip -r9.If you used virtualenv, your client packages are in one of the following directories:
    cd $VIRTUAL_ENV/lib64/python3.6/site-packages
    zip -r9 ~/svls/serverless-query.zip *
    
    # or
    
    cd $VIRTUAL_ENV/lib/python3.6/site-packages
    zip -r9 ~/svls/serverless-query.zip *
  2. From your project directory, add the serverless-query.py script to the zip file:
    cd ~/svls
    zip -g serverless-query.zip serverless-query.py

Step 5: Create the Lambda function and API Gateway REST API

  1. From your client host, make an Amazon S3 bucket if you don’t have one.
    aws s3 mb s3://your-s3-bucket-name
  2. Load the zip file to your S3 bucket:
    aws s3 cp serverless-query.zip s3://your-s3-bucket-name
  3. Create the AWS CloudFormation stack using the AWS CLI.
    aws cloudformation create-stack \
    --stack-name serverless-query-cfn \
    --template-body file://serverless-query-cfn.json \
    --region your-database-region \
    --capabilities CAPABILITY_NAMED_IAM \
    --parameters '[
       {"ParameterKey":"PEndpoint","ParameterValue":"your-database-endpoint"},
       {"ParameterKey":"PPort","ParameterValue":"your-database-port"},
       {"ParameterKey":"PDatabase","ParameterValue":"your-database-name"},
       {"ParameterKey":"PDbUser","ParameterValue":"your-db-user-name"},
       {"ParameterKey":"PDbPassword","ParameterValue":"your-db-user-password"}, 
       {"ParameterKey":"PS3Bucket","ParameterValue":"your-S3-bucket-name"},
       {"ParameterKey":"PSubnetIds","ParameterValue":"your-EC2-client-SubnetId"},
       {"ParameterKey":"PSecurityGroupIds","ParameterValue":"your-EC2-client-SecurityGroupId"}
     ]'

    Neptune users must set these variables to “None”:

       {"ParameterKey":"PDatabase","ParameterValue":"None"},
       {"ParameterKey":"PDbUser","ParameterValue":"None"},
       {"ParameterKey":"PDbPassword","ParameterValue":"None"}, 
  4. From the AWS CloudFormation console dashboard, check the serverless-query-cfn stack Events tab for progress. If the stack status is ROLLBACK_COMPLETE, find the reason for the failure under the Events tab. For more information, see Troubleshooting and Tips on making changes later in this post.
  5. When the stack status is CREATE_COMPLETE, test the Lambda function from the command line:
    rm -f invoke-lambda.out
    aws lambda invoke \
    --function-name ServerlessQuery \
    --region your-database-region \
    --log-type Tail invoke-lambda.out
    echo **Return Object** 
    cat invoke-lambda.out

    To address any issues, see CloudWatch Logs, and the Troubleshooting and Tips on making changes sections later in this post.

    To view CloudWatch Logs:

    1. Open the CloudWatch console.
    2. In the left navigation, choose Logs.
    3. Choose /aws/lambda/ServerlessQuery.
    4. Choose the log stream that has the latest Last Event Time.

    Unless you commented out the print(results_list) statement from the Python script, you will also see the query result in the log stream.

Step 6. Query the database from the URL

You have two options for querying your database:

Option 1: Curl the URL from the command line:

Get the api-id:

aws apigateway get-rest-apis

Then:

curl https://api-id.execute-api.your-database-region.amazonaws.com/beta/query1

Option 2: Paste the URL into a browser:

Go to the API Gateway dashboard. Cut and paste the “invoke URL” for your beta stage into a browser, and append your resource name /query1 to the end of it.

To address any issues, see CloudWatch Logs, and the Troubleshooting and Tips on making changes sections.

View the query results from browser:

Step 7: Next steps

After setting up and testing your serverless application, here are a few best practices and administration tasks you can perform to help optimize your solution.

  • Further protect your database password and other connection variables by encrypting Lambda environment variables with AWS KMS keys.
  • Increase availability by adding a second subnet to the Lambda function VPC configuration.
  • If you plan to call your API from an external site, such as a static S3 website, you need to enable CORS (cross-origin resource sharing) for your API Gateway resource. Note that the integration response is disabled for proxy integration, which we have used here. This means that, when configuring the response headers for CORS, you have to rely on the backend to return the Access-Control-Allow-Origin headers.
  • To ease management of your serverless AWS CloudFormation templates, try the AWS Serverless Application Model (AWS SAM).

Step 8: Review tips and troubleshooting

As you complete the steps in this example, use the following guidance to help address any issues you might encounter.

Tips on running longer queries

The example uses default timeout settings to run a short query. To run longer queries, you can do any of the following:

  • Increase timeout parameters for both API Gateway and Lambda up to their respective runtime limits. For synchronous invocation, you might need to set the Lambda timeout slightly lower than the API Gateway timeout to allow the REST API to finish cleanly.
  • Invoke Lambda asynchronously so that only the Lambda limit applies.
  • Create the Lambda function, and then choose an alternate invocation method. The Lambda runtime limit still applies.
  • Push work to the backend. For example, use a materialized view or custom batch job to process the query locally and automatically, such that the summary result can be queried from a Lambda function within the runtime limit.
  • Make sure that your data is properly indexed and the query is tuned. Use partitioning and filtering to query the smallest possible dataset. Cache results if it’s appropriate for your application.
  • Limit query output by using the LIMIT clause or an equivalent.

Tips on running the Python code

The Python code samples use best practices when working with Lambda, API Gateway, and specifically the two relational databases and the graph database in the examples: MySQL, PostgreSQL, and Neptune (SPARQL), respectively. These best practices are as follows.

  • Initialize variables, and perform other bootstrap operations, from outside the Lambda handler function. This allows the bootstrapping to run only once, from the Lambda execution environment (formerly “container”). As long as the execution environment is active, only the operations inside the handler function execute.
  • A common best practice in Lambda is to open the database connection in the Lambda execution environment so it in can be reused with each handler call. If the database connection drops at some point you reconnect from inside the handler. Using a global connection variable inside the handler ensures that any new connections stay open for subsequent calls. But in the case of MySQL and PostgreSQL there is risk of session leakage across the open connection that can result in locking issues. So even though our simple example is a read-only query, we open and close the connection in each handler call to avoid this potential issue. Neptune users: This does not apply to the SPARQL example code because it does not make a persistent database connection.
  • The return object inside the handler is required by API Gateway. Use it in place of sys.exit() to exit gracefully. Otherwise you might see 502 errors reporting a “Malformed Lambda Proxy Response.”

Tips on latencies

If you’re new to using AWS Lambda, you might have questions about Lambda-related latencies. These latencies are explained here.

  • The first Lambda invocation might take longer than expected because it includes bootstrap steps (cold start) that are not performed in subsequent calls (warm start).
  • After a period of inactivity, Lambda execution environments are terminated. This action can result in a cold start performance penalty the next time the Lambda function is called. This performance penalty manifests as a latency spike.
  • Here is an example of warm start versus cold start elapsed times for the same query:
warm start cold start
time ./invoke-lambda.cli
real    0m0.389s
time ./invoke-lambda.cli
real    0m9.428s

Tips on making changes

It’s best to avoid deleting and re-creating individual AWS components that have interdependencies. The following tips should help you make any changes you need.

  • To change AWS CloudFormation stack parameters, update the stack. See the example script update-stack.cli in GitHub.
  • To change the Python code, re-zip and update the Lambda function. See the example script update-lambda.cli in GitHub.
  • You can also delete the entire AWS CloudFormation stack and start over. But note that the stack that’s used in this example contains a VPC configuration that creates an elastic network interface. The network interface takes time to clean up, so the delete can take a while.

Tips on using and scaling a VPC-enabled Lambda function

VPC-enabled Lambda functions pose special challenges, especially with scaling. The following tips guide you in addressing these challenges.

  • Interfaces that are created for VPC-enabled Lambda functions are assigned a private IP address in the subnet. Therefore, the Lambda function does not have internet access even if the subnet itself is public. If the Lambda function requires internet access, create a route to a network address translation (NAT) gateway, rather than to an internet gateway, in the route table that the subnet is associated with.
  • Make sure that the subnets have enough IPs to handle the expected concurrency on the Lambda function.
  • Because there are limits on the rate at which you can request the network interfaces, make sure that the Lambda function has sufficient network interface capacity. Monitor the memory usage of the Lambda function, and reduce its unused memory allocation as this increases network interface sharing.
  • If your database is publicly accessible, you probably don’t need to put your Lambda function in a VPC at all.

More tips

See the following resources for additional guidance as you’re working through the steps in this example.

Troubleshooting

Here are some common issues you might encounter, and a few suggested solutions for those issues.

Issue: Your Python code runs from your command line, but it hangs or times out when you invoke the Lambda function or API Gateway REST API.

  • Make sure that the database, client host, and AWS CloudFormation stack are all created in the same VPC and AWS Region.
  • Check the database security group. Make sure that the client host security group ID, not its CIDR, is permitted as a source in inbound rules.
  • Your Lambda function might be failing and retrying. See CloudWatch Logs for errors and CloudWatch metrics for resource issues.
  • If you are testing a query other than the simple example that was provided, see the Tips on running longer queries Also try the simple query that was provided.

Issue: The simple query finishes, but it takes a long time to run.

  • See the Tips on latencies section, and try invoking the Lambda function twice to compare cold and warm starts.

Issue: You’re getting the error “Unable to import module”:

  • Check CloudWatch Logs for additional information.
  • Verify that the source code file is named serverless-query.py, the zip file is serverless-query.zip, and the handler name in your AWS CloudFormation template is serverless-query.handler.
  • Verify that all permissions were set as instructed.
  • Unzip serverless-query.zip, and verify that serverless-query.py is in the root directory.
  • Rezip files exactly as instructed.
  • In your create-stack command, verify that the S3 bucket name is correct and of the form your-bucket-name (not prefixed with S3://). Verify that the latest version of serverless-query.zip is in the bucket. Check permissions on the bucket.
  • Check for typos in your create-stack command parameters.

Issue: The AWS CloudFormation stack is taking a long time to delete. The AWS CloudFormation Events log shows a status reason of “CloudFormation is waiting for Network Interfaces associated with the Lambda Function to be cleaned up.

Issue: Calling the REST API throws the error {"message":"Missing Authentication Token"}.

  • Append the resource name “\query1” to the URL.

Issue: Calling the REST API throws the error “SyntaxError: JSON.parse: unexpected character at line 1 column 2 of the JSON data.

  • Try from a browser other than Mozilla Firefox.

For all other error messages: See CloudWatch Logs for additional information.

Summary and final steps

Serverless applications provide benefits like high availability, flexibility, and ease of management. This post provides an example of how you can set up a serverless application and query an AWS database from a URL, even if the database is not publicly accessible. The steps in the example walked you through setting up, testing, and troubleshooting the solution.

Your last step is to delete the EC2 client host if it’s no longer needed. Because, after all, that was the whole point. :)


About the Author

Deana Holmer is a senior database engineer in Premium Support team at Amazon Web Services.