Use the Data API to interact with an Amazon Aurora Serverless MySQL database
Amazon Aurora Serverless is an on-demand, automatically scaling configuration for Amazon Aurora (MySQL-compatible edition). The database automatically starts up, shuts down, and scales capacity based on your application’s needs. It enables you to run your database in the cloud without managing any database instances. It’s a simple, cost-effective option for infrequent, intermittent, or unpredictable workloads.
Recently, AWS announced the general availability of the Data API. This feature enables you to access a MySQL-compatible version of an Amazon Aurora Serverless database with a simple API endpoint and without the hassle of managing persistent database connections in your applications. If you use AWS Lambda, the Data API provides a secure way to access the database without the additional overhead for Lambda functions launched in a VPC. Also, the Data API can use database credentials stored in AWS Secrets Manager, eliminating the need to pass credentials in API calls.
In this post, I discuss how to provision an Aurora Serverless MySQL cluster using infrastructure as code (AWS CloudFormation). I provide code examples that show how the Data API can be used to issue SQL commands against the Aurora Serverless database for various use cases. If you’re looking for in-depth an example, see the full end-to-end sample Serverless application on GitHub.
Updates – Sept 23rd, 2019
- You can now enable the Data API on cluster create. Simply expand the Connectivity options and click Data API:
- In addition to Aurora MySQL Serverless, the Data API is now also supported by Aurora PostgreSQL Serverless database
As illustrated in the following figure, you can use the Data API across various types of AWS cloud-native or on-premises applications, including EC2 instances (VMs), containerized, and serverless applications:
You can use the AWS CLI and the various AWS SDKs to programmatically issue SQL commands against an Amazon Aurora Serverless database via the Data API endpoint. Programmatic commands offer advantages in the context of short-lived event-driven applications (for example, AWS Lambda) that now do not have to deal with the hassle of managing persistent database connections. Also, having a central endpoint for DB connections can reduce the load on the database when compared to a large number of individual Lambda functions managing connections of their own.
Provisioning the Amazon Aurora Serverless cluster
Start by provisioning an Amazon Aurora Serverless database. You use AWS CloudFormation to provision the database cluster. The provided template (see the following code example) includes an Aurora Serverless cluster, a Secrets Manager to generate and store database credentials, and a subnet group as resources. For a full reference to the template, see the sample code.
The template takes input parameters that are referenced by template resources, such as the following:
- Environment type (envType)
- Database cluster name (DBClusterName)
- Database name (DatabaseName)
- Database master user name (DBMasterUserName)
- A list of subnets (DBSubnetList)
I configure Amazon Aurora to run in serverless mode (
EngineMode: serverless) and to automatically pause the database (
AutoPause: true) if there’s no database activity for 15 consecutive minutes (
SecondsUntilAutoPause: 900). I also specify a minimum capacity (
MinCapacity) of one ACU (Aurora Capacity Unit) and a maximum capacity (
MaxCapacity) of four ACU.
Each ACU is a combination of processing and memory capacity. Based on these settings, Aurora Serverless automatically creates scaling rules, setting thresholds for CPU utilization, connections, and available memory. Database storage automatically scales from 10 GiB to 64 TiB, the same as storage in a standard Aurora DB cluster. For more information, see How Aurora Serverless Works.
Aurora Serverless cluster template – code example #1
You can use AWS CloudFormation dynamic references to auto-populate Aurora cluster properties. For example,
MasterUserPassword auto-populate from the Secrets Manager attributes username and password, respectively. For more information, see the previous and following code examples.
The username references a template’s input parameter called
DBMasterUserName. The password, however, is generated dynamically and never revealed to anyone. Consequently, you never expose the database password in application or configuration code. Instead, it’s resolved at runtime when the application code retrieves the corresponding secret from Secrets Manager.
Aurora Serverless cluster template – code example #2 (resources section, continued …)
Use the provided list of subnets as input to the template to create a subnet group resource (DBSubnetGroup) for the Aurora Serverless. For more information, see property
DBSubnetGroupName in the preceding example #1 and example #3, which follows.
Aurora Serverless cluster template – code example #3 (resources section, continued …)
After the template deploys, the stack provisions the Aurora Serverless cluster.
Using the Data API to interact with the Aurora Serverless database
Follow these steps to interact with the Aurora Serverless database.
Enabling the Data API endpoint
You must enable the Data API for the corresponding Aurora Serverless cluster to use it.
In the RDS console, select your instance and choose Network & Security, select the Data API check box, and apply the changes to the cluster immediately. For more information, see Modifying an Aurora Serverless DB cluster or the Data API public launch blog post.
You can also enable the Data API using the AWS CLI, with the following command:
Make sure that you’re using the latest version of the AWS CLI.
Using the Query Editor for Amazon Aurora Serverless
By enabling the Data API, you can also now use the Query Editor for Amazon Aurora Serverless to issue SQL statements against the Aurora Serverless database, as shown in the following diagram. Make sure that your IAM user and role have permissions to access the Query Editor. For more information, see Using the Query Editor for Aurora Serverless.
Using the AWS SDKs to invoke the Data API
You can use the AWS CLI or the various SDKs to invoke the Data API. I walk you through several examples using the AWS SDK for Python (Boto3).
You can obtain complete versions of the code examples discussed in this post from aws-aurora-serverless-data-api-sam GitHub repo.
Start by importing the boto3 library and creating an
RDSDataService client to interact with the Data API (see
rds_client following object). Access the Data API functionality using the client object.
Then, specify the database name, database ARN, and the Secrets Manager secret ARN for your Aurora Serverless cluster as module-scope variables. In the following code example, update the values for your own environment:
In practice, you can export these values via the AWS CloudFormation stack that provisioned the Aurora Serverless cluster and import them automatically using scripts such as this one. For instance, look at the output parameters from the sample CloudFormation template.
RDSDataService client provides a function called
execute_statement() that enables the issuance of SQL statements against our Aurora Serverless database. This function takes the parameters discussed previously. Create a wrapper function, so you don’t have to pass these parameters for every invocation. For more information, see the following wrapper function
This function takes a SQL statement as an input parameter, executes the statement against the Aurora Serverless database, and returns a raw response object. I use this function in several examples that follow.
Example 1 – Issuing DDL commands to create the database and a table
You can now use the wrapper
execute_statement() function to execute DDL commands. For instance, create a database table.
First, create a file with the following content and name it
You use this file to create a table called package to store the name and version of software packages (for example, MySQL 5.7.21).
Next, create a MySQL database object and the package table by calling function
execute_statement(), as in the following code example:
To create multiple tables, you can use an array to iterate through multiple DDL .sql files and use function
execute_statement(), again. For more information, see the aws-aurora-serverless-data-api-sam GitHub repo.
Example 2 – Creating a simple query
Now use the function
execute_statement() to run a simple query and print the raw results returned in the standard output:
Example 3 – Creating a parameterized query
RDSDataService client also supports parameterized queries by allowing you to use placeholder parameters in SQL statements. Escaped input values permit the resolution of these parameters at runtime. Parameterized queries are useful to prevent SQL injection attacks.
For instance, the following SQL query uses a parameter called
package_name. It’s a placeholder for a package name that resolves at runtime when the query is executed using the value from variable
Now refactor the function
execute_statement() to take an extra parameter (
sql_parameters) to support parameterized SQL statements.
Example 4 – Formatting query results
As you’ve probably noticed already, the function
execute_statement() returns raw results that likely need parsing and formatting before use. The following code example uses a couple of small Python functions (
formatField()) to format a list of returned records by processing individual records and fields.
Example 5 – Creating a parameterized SQL insert
You can also parameterize other SQL statements, including insert statements. For example, the following code example uses the
execute_statement() function to insert tuple (“package-2”, “version-1”) into the package table.
Example 6 – Batching SQL inserts
The Data API also supports batching by executing a SQL statement multiple times against a set of specified parameters using a single API call. Batching can lead to significant performance gains, as the overall network time to process multiple SQL statements is drastically reduced (for example, inserting hundreds of rows in a table).
RDSDataService’s batch_execute_statement() function makes these gains possible. This function takes similar parameters as the
execute_statement() function but uses a two-dimensional array for the list of parameterized values.
RDSDataService’s batch_execute_statement() function to simplify its list of parameters and make it easier for callers to invoke the function. The following wrapper
batch_execute_statement() function takes a SQL statement and a two-dimensional array (set of lists) as parameters, invokes the original
batch_execute_statement() function, and returns a raw response object:
You can now invoke your
batch_execute_statement() wrapper function to insert multiple rows using a single API call.
For instance, batch insert 10 packages in your packages table. First, populate the
sql_parameters_sets 2-dimensional array with package names and versions. Then call
batch_execute_statement() passing the SQL insert statement and the array as parameters, as shown in the following code example:
Example 7 – Handling exceptions
Now, create a specific exception type to raise and catch database-related errors in your code. Start by creating a simple class
DataAccessLayerException, as shown in the following code example:
Your new exception type can now be used to raise and catch database-related errors.
For instance, the following code example shows how the function
add_package() catches any errors that can potentially happen when issuing a SQL statement against the database. It wraps and re-raises the original exception as a
DataAccessLayerException exception using Python 3’s “raise from” statement:
You can now use a try/except block to handle
DataAccessLayerException exceptions when calling function
add_package(), as shown in the following code example:
Example 8 – Committing or rolling back transactions
The Data API supports transactions. Your code can start a transaction, execute SQL commands within the context of that transaction, and then commit the transaction. If an exception occurs during this process, the transaction can be rolled back entirely.
Now, try updating the functions
batch_execute_statement() to support transactions. These functions now take an optional
transaction_id parameter. If the caller doesn’t provide the
transaction_id value, the functions execute as normal, and changes resulting from the call commit automatically. Otherwise, an explicit call to
rollback_transaction() is required.
Refactored to support transactions, the
function execute_statement() renders as follows:
Refactored to support transactions, function
batch_execute_statement() renders as follows:
Use the updated version of function
batch_execute_statement() to batch insert data within a transactional context. If no errors occur, the transaction commits. Otherwise, it rolls back.
In the code example, the
transaction_id passes to function
batch_execute_statement() using the transaction object returned by function
In this post, I discussed how to provision an Aurora Serverless MySQL cluster using infrastructure as code (AWS CloudFormation). I provided code examples that showed how the Data API can be used to issue SQL commands against the Aurora Serverless database for various use cases. If you’re looking for in-depth examples, see the full end-to-end sample Serverless application on GitHub.
The Data API is available in the US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Tokyo), and Europe (Ireland) Regions. There is no charge for the API, but you pay the usual price for data transfer out of AWS.
About the Author
Marcilio Mendonca is a Sr. Global Consultant in the Professional Services Team at Amazon Web Services. He has helped AWS customers design, build, and deploy best-in-class, cloud-native AWS applications using VMs, containers, and serverless architectures. Before joining AWS, Marcilio was a Software Development Engineer at Amazon. Marcilio also holds a Ph.D. in Computer Science. In his spare time, he enjoys playing drums, riding his motorcycle, and spending quality time with family and friends.