Introducing the Data API for Amazon Aurora Serverless v2 and Amazon Aurora provisioned clusters

Traditionally, applications that communicate with relational databases use drivers that provide a persistent connection between the application and the database server. With the advent of serverless applications, including those created using AWS Lambda, persistent connections from the application are no longer practical because serverless applications are, by nature, stateless. What’s more, traditional database drivers often come with a set of complex tuning parameters that may be implemented incorrectly by developers who are less familiar with those drivers.

In November 2018, AWS announced the launch of the Data API for Amazon Aurora Serverless. The Data API is an intuitive, secure HTTPS API for running SQL queries against a relational database that enables you to accelerate modern application development. The Data API appeals to customers of all sizes, from startups to enterprises, seeking to minimize the time-consuming network and application configuration tasks needed to securely connect to an Amazon Aurora database. The Data API eliminates the use of drivers and improves application scalability by automatically pooling and sharing database connections (connection pooling) rather than requiring you to manage connections. You can call the Data API via an AWS SDK or the AWS Command Line Interface (AWS CLI).

This feature enabled developers to quickly and securely access Aurora Serverless v1 clusters via a stateless HTTP API. You can use a familiar API interface without needing to know the intricate details of a given database driver. What’s more, the Data API handles connection pooling between the Data API and the database. This helps database applications scale by reusing connections without the developer needing to configure or manage a connection pool.

In this post, we discuss improvements made to the Data API and its support for Aurora Serverless v2 and Aurora provisioned clusters.

Aurora Serverless v2 and the Data API

The initial release of the Data API worked specifically with Aurora Serverless v1. Aurora Serverless v1 was introduced to enable you to automatically scale your database compute resources in response to the workload. In November 2022, AWS announced the release of the next iteration of Aurora Serverless: Aurora Serverless v2.

Several improvements have been made to Aurora Serverless v2, including more granular scaling, scaling in response to memory pressure, and a dynamically resized buffer cache. Perhaps the most important improvement to Aurora Serverless v2 is the fact that it’s implemented at the instance layer rather than the cluster layer. With Aurora Serverless v1, the cluster had to be created specifically as an Aurora Serverless v1 cluster, which only supported a single instance. Aurora Serverless v2 is implemented as an instance type, similar to the Intel or Graviton instance types currently used to create provisioned instances. This means that in a given Aurora cluster, you can have both provisioned instances and Aurora Serverless v2 instances. Because Aurora Serverless v2 is simply a new instance type, this means that the features available to provisioned instances are now also available for serverless workloads.

Overview of the Data API for Aurora Serverless v2 and Aurora provisioned clusters

AWS has rebuilt the Data API for Aurora Serverless v2 and Aurora provisioned to operate at the scale and high availability levels required by our biggest customers. The following are some of the improvements:

Because the Data API now works with both Aurora Serverless v2 and provisioned instances, database failover is supported to provide high availability.
We have removed the 1,000 requests per second limit. The only factor that limits requests per second with the Data API for Aurora Serverless v2 and Aurora provisioned is the size of the database instance and therefore the available resources.
Although the Data API for Aurora Serverless v2 and Aurora provisioned has been initially launched on Amazon Aurora PostgreSQL-Compatible Edition, support for Amazon Aurora MySQL-Compatible Edition will soon follow.
The Data API has always been designed to be straightforward to use. In fact, there are only five API calls that comprise the entire interface to the Data API. To maintain compatibility with applications previously developed using the Data API, we have kept the same API signatures.

In the following sections, we discuss additional improvements to the Data API and design considerations. Then we demonstrate how to migrate Aurora Serverless v1 clusters to Aurora Serverless v2, configure the Data API, and use it to run queries.

AWS CloudTrail logging

AWS CloudTrail records events within an AWS account, including management and data events. Management events capture changes to infrastructure. For example, the creation of an Amazon Simple Storage Service (Amazon S3) bucket will create a management event. However, changes to data in an S3 bucket are recorded as data events.

The Data API for Aurora Serverless v2 and Aurora provisioned will record CloudTrail events as data events, whereas the Data API for Aurora Serverless v1 captures events as management events. This logging feature is disabled by default, but can be enabled if it’s important to an organization to capture data events applied through the Data API for a nominal cost. For pricing details, refer to AWS CloudTrail pricing.

Design considerations

When designing an application to use the Data API for Aurora Serverless v2 and Aurora provisioned, there are a few things to consider. First, the Data API has a size limit of 64 KB per row in the result set returned to the client. It’s important that the application limits the size of each returned row to 64 KB. For more information, refer to Troubleshooting Data API issues.

Migrate Aurora Serverless v1 clusters

Now that we’ve covered what’s new with the Data API for Aurora Serverless v2 and Aurora provisioned, let’s walk through how to set it up.

The first step for Aurora Serverless v1 customers to move to Aurora Serverless v2 is to call the rds-modify-db-cluster command. Making this call creates a snapshot of the existing Aurora Serverless v1 cluster and performs an in-place migration from Aurora Serverless v1 to an Aurora provisioned cluster with a single node. The following AWS CLI command is an example call that upgrades an Aurora Serverless v1 cluster named asv1a to a provisioned cluster with one node, which is of the class db.r5.large:

aws rds modify-db-cluster --db-cluster-identifier asv1a --engine-mode provisioned --allow-engine-mode-change --db-cluster-instance-class db.r5.large

Note that depending on your particular workload, you may require a different instance size and type than db.r5.large.

Because the Data API formerly worked only with Aurora Serverless v1, some workloads that would be better suited for a provisioned instance had to run in a serverless instance. With this update, after this modify-db-cluster command is complete, there is no further modification required of the cluster. However, if the workload is best suited for a serverless configuration, you can change the single node in the cluster db.r5.large to db.serverless in two steps.

The first step is to set the scaling configuration of the cluster. This can be accomplished with the following command:

aws rds modify-db-cluster --db-cluster-identifier asv1a --serverless-v2-scaling-configuration MinCapacity=0.5,MaxCapacity=128

This command specifies that Aurora Serverless v2 nodes within this cluster can scale from a minimum of 0.5 ACU to a maximum of 128 ACU. Depending on your particular requirements, you may set different values. What’s more, these values can be changed in the future if the application needs change.

The next step is to modify the existing instance in the cluster from db.r5.large to db.serverless. This can be accomplished with the following command:

aws rds modify-db-instance --db-instance-identifier asv1a-instance-1 --db-instance-class db.serverless --apply-immediately

Note that these steps will result in some downtime. For instructions on performing this upgrade with minimal downtime, refer to Upgrade from Amazon Aurora Serverless v1 to v2 with minimal downtime.

Configure the Data API

Whether you have just upgraded an Aurora Serverless v1 cluster or you have an existing cluster comprised of Aurora Serverless v2 or provisioned instances, there are only two steps to configure the Data API.

The first step is to get the ARN of the cluster to which you would like to connect the Data API. To get the ARN of the cluster in this example, make the following AWS CLI call:

aws rds describe-db-clusters --db-cluster-identifier asv1a --query DBClusters[0].DBClusterArn

The next step is to enable the HTTP endpoint for the cluster, specifying the ARN from the previous step:

aws rds enable-http-endpoint --resource-arn arn:<ARN of existing cluster>

Run queries using the Data API

With the Data API successfully connected to the cluster, the next step is to query some data. This can be accomplished with a variety of AWS SDKs; for this post, the example is written in Python.

Before writing any code, there is one more piece of information required to run queries via the Data API: the ARN of an AWS Secrets Manager secret.

To create the Secrets Manager secret and get its ARN, make the following call:

aws secretsmanager create-secret --name appsecret --description "Data API Secret" --secret-string "{\"username\":\"<username>\",\"password\":\"<password>\"}" --query ARN

Be sure that the user credentials in the preceding code exist in the database and have appropriate permissions.

After you have acquired the cluster ARN and secret ARN, you can write code using the AWS SDK like the following Python Boto3 example:

import boto3

rdsData = boto3.client("rds-data")
cluster_arn = "<cluster ARN>"
secret_arn = "<secret ARN>"

# Drop database if exists
rdsData.execute_statement(resourceArn=cluster_arn, secretArn=secret_arn, database="postgres", sql="drop database if exists testdb with (force);")

# Create database
rdsData.execute_statement(resourceArn=cluster_arn, secretArn=secret_arn, database="postgres", sql="create database testdb;")

# Create table
rdsData.execute_statement(resourceArn=cluster_arn, secretArn=secret_arn, database="testdb", sql="create table table01 (row_id int, rowval varchar(50));")

# Insert rows into the table
for i in range(10):
    rdsData.execute_statement(resourceArn=cluster_arn, secretArn=secret_arn, database="testdb", sql="insert into table01 (row_id, rowval) values ({}, 'Hello World!');".format(i))

# Select the rows from the table
response1 = rdsData.execute_statement(resourceArn=cluster_arn, secretArn=secret_arn, database="testdb", sql="select * from table01;")
for row in response1["records"]:
	print (row)

Integrate with AWS AppSync to create GraphQL APIs

You can use the Data API with AWS AppSync to create GraphQL APIs that connect to your Aurora databases. AWS AppSync is a managed service that makes it straightforward to connect your web and mobile applications to data on AWS. You can now connect your Aurora Serverless v2 and Aurora provisioned databases to AWS AppSync directly, and use AWS AppSync JavaScript resolvers to run your SQL statements on your database. Web and mobile applications can then interact with your database via your AWS AppSync API without launching any additional resources. To learn more about AWS AppSync and building application APIs for your Aurora databases, refer to the AWS AppSync documentation.

Summary

In this post, we examined how the Data API has been expanded to support Aurora Serverless v2 as well as provisioned instances. This enables you to take full advantage of the Aurora platform with the Data API. Additionally, request limits have been removed, allowing for larger and more complex workloads. Get started with the Data API today!

About the Author

Steve Abraham is a Principal Solutions Architect for Amazon Web Services. He works with our customers to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.