10 Amazon Aurora tips for application developers – Part 1

Amazon Aurora is a MySQL- and PostgreSQL-compatible enterprise class database built from the ground up for cloud-native applications. Application developers write their code using standard tooling and libraries such as JDBC drivers for Java applications, and NodeJS packages for JavaScript applications. Developers often treat Aurora like a typical RDBMS database engine and fail to utilize various out-of-the-box features and capabilities offered by Aurora. In addition to the out-of-the-box capabilities, Aurora natively integrates with other AWS services to support building feature-rich, secure, and performant applications.

In this two-part series, we share tips on how, as an application developer, you can enhance the security, performance, reliability, and availability of your applications using Aurora features and other AWS services. In Part 1, you learn how applications can utilize read replicas, fast failovers, cloning, global databases, and Amazon Aurora Serverless v2. In Part 2 of this series, we share tips that will show you how to enhance the performance, security, and availability of your apps.

Prerequisites

You can try out the hands-on steps using Amazon Aurora PostgreSQL-Compatible Edition or Amazon Aurora MySQL-Compatible Edition. For this post, we demonstrate the steps with Amazon Aurora PostgreSQL-Compatible Edition.

You must complete the following steps before starting:

Have an AWS account with permissions to create resources.
Have an Aurora PostgreSQL DB cluster (v13.7 or higher) to try out the various features discussed in this post. You may use an existing cluster or create a new cluster. As a best practice, always have more than one DB instance in a cluster, but for this post, you may create a cluster with a single instance.
Set up a bastion host (an Amazon Elastic Compute Cloud (Amazon EC2) instance) using the Amazon Linux 2 AMI or related AMI:
1. Install the AWS Command Line Interface (AWS CLI) and PostgreSQL client (or MySQL client).
2. Launch a small EC2 instance in the same VPC as your database. Make sure you can connect to it from your machine.
3. Set up network routes, so it can connect to the Aurora endpoints.
4. Install the PostgreSQL client tools using the following commands:
```
 sudo amazon-linux-extras install postgresql13 -y
 sudo yum install postgresql-contrib-13.7 -y
```
5. For trying out the Python code, install the psycopg2 package:
```
pip install psycopg2-binary
```

Tip #1: Split the SQL read and write transactions for performance

An Aurora database cluster consists of one primary instance that serves the write/read queries requests. You may also create up to 15 read replicas. The read replicas serve only the read queries, and any attempt to run update, delete, or insert queries against the readers will lead to a SQL error on the application. Aurora exposes the cluster endpoint for running write SQL against the cluster. Aurora exposes the reader endpoint for running select queries against the readers. If more than one reader instance is created, Aurora uses DNS round robin to distribute the connections. Note that this connection distribution mechanism is not the same as load balancing. The following diagram illustrates this architecture.

Reader & Writer Endpoint

As a best practice, split the transactions in your application between the read and write transactions and route the SQL to the appropriate DB instance using the appropriate endpoint. With this approach, you will get better performance from the database because the SQL load will be spread across multiple DB instances. Note that readers are eventually consistent due to replication lag (usually under 100 milliseconds after the primary has written an update), so if you have a transaction that needs read after write consistency, then use the cluster endpoint for that transaction. In the case that there is a single instance in a DB cluster, then both endpoints point to the same instance. In the case that there is more than one instance in the cluster, then invoking a write transaction against the reader endpoint will lead to an application error.

In the following Python example code, the read traffic (SQL selects) is run against the reader endpoint, and the write traffic is run against the cluster endpoint. To try out this code in your command shell or terminal:

Create or use an existing database; for testing you can use any table in the database.
Adjust the SELECT and INSERT statements in the code to match your database schema.
Set the environment variables and then run the code from the prompt:

# Read & Write transactions split
import psycopg2
import os

# Set these environment variables (PG_USER, PG_PASSWORD, PG_DATABASE
# AURORA_CLUSTER_ENDPOINT, AURORA_READER_ENDPOINT)
USER = os.getenv('PG_USER')
PASSWORD = os.environ.get('PG_PASSWORD')
DATABASE = os.environ.get('PG_DATABASE')

# Aurora PostgreSQL cluster endpoint
WRITE_EP = os.environ.get('AURORA_CLUSTER_ENDPOINT')
# Aurora PostgreSQL reader endpoint
READ_EP  = os.environ.get('AURORA_READER_ENDPOINT')

def  get_all_orders(customer_id):
    sql = "SELECT * FROM orders WHERE customert_id=..."
    
    # Connect to the READER endpoint
    conn = psycopg2.connect(
        host= READ_EP,
        database = DATABASE,
        user = USER, password = PASSWORD
    )

    # create a cursor
    cur = conn.cursor()
    cur.execute(sql)

    rows = cur.fetchall()

    # process the data ...

def add_order(order_details):
    sql = "INSERT INTO orders VALUES(...)"
    
    # Connect to the CLUSTER endpoint
    conn = psycopg2.connect(
        host= WRITE_EP,
        database = DATABASE,
        user = USER, password = PASSWORD
    )
    # Code for inserting an order ...

Let’s try out the Aurora endpoints:

On the Amazon RDS console, navigate to your database.
Select the DB cluster and on the Actions menu, choose Add reader.
Provide a name for the reader and leave all other settings at default.
Copy the cluster endpoint to the clipboard.
SSH to your bastion host and connect to the cluster endpoint using psql:

psql -h <<Paste your cluster endpoint>> -U <<DB User name>>

=> \conninfo

Note the IP address for the DB instance:

=> \q

Now copy the reader endpoint to the clipboard.
SSH to your bastion host and connect to the reader endpoint using psql:

psql -h <<Paste your Reader endpoint>> -U <<DB User name>>

=> \conninfo

Note the IP address for the DB instance:

=> \q

Compare the IP address for the two endpoints—it will be different.
Try running a SQL insert on the reader instance—it will fail.
To clean up, delete the reader instance when you’re done, because a running instance will cost you money.

In addition, Aurora offers a policy-driven auto scaling mechanism for read replicas. Aurora Auto Scaling lets you automatically increase or decrease the number of reader instances in the cluster based on the average load across the readers or a custom Amazon CloudWatch metric. The creation of the replica takes a few minutes regardless of the database size. This is due to Aurora’s compute-storage decoupling—because the storage volume is shared by the database instances, there is no need to copy the data to the reader’s local storage.

Tip #2: Use newer JDBC drivers for faster recovery

The read replicas in the Aurora database cluster are a target for failover. Failover can be initiated manually or may happen automatically in the case of a failure of the primary instance. After the failover, one of the read replicas becomes the primary instance in the cluster. In the case of a manually initiated failover, the original writer instance will become a reader node. In case of automated failover, one of the Aurora reader nodes will be promoted as the new writer and the old writer node will be replaced and added as a reader node to the cluster. Aurora automatically updates the cluster endpoint to point to the new primary instance (IP address). If the application is caching the cluster endpoint DNS resolution for a longer period of time, then the application may experience errors due to stale IP mapping for the endpoint in the cache. The following diagram illustrates this architecture.

Applications written in Java can utilize the AWS JDBC Driver for MySQL. This driver is a drop-in replacement for the MySQL Connector/J driver. It has a built-in mechanism to make sure that the cluster endpoint is always mapped to the writer instance in the cluster. As a result, a Java application that is using the AWS JDBC Driver for MySQL will have faster recovery compared to one that is using the MySQL Connector/J driver. Similarly for Java apps connecting to Amazon Aurora PostgreSQL-Compatible Edition, you may use the AWS JDBC Driver for PostgreSQL. If you have a constraint in terms of replacing the JDBC driver, you may adjust the DNS TTL (time to live) setting for your Java applications for faster recovery.

The DNS TTL is a setting that tells the DNS resolver how long to cache a query before requesting a new one. As a best practice, set the DNS TTL to under 30 seconds. This will make sure that your application can reconnect to the new primary instance in a reasonable time frame. It is recommended that you test your application’s failover recovery behavior by running continuous load against the app and carrying out a forced failover.

To set the DNS TTL globally for all Java apps running on the host, you may edit the file $JAVA_HOME/jre/lib/security/java.security to set the property:

networkaddress.cache.ttl=15

You may also set the DNS TTL in the Java code:

java.security.Security.setProperty(“networkaddress.cache.ttl”, “15”)

Although the techniques discussed here will help your Java apps recover faster, we recommend using Amazon RDS Proxy, which we cover in Part 2 of this series.

Tip #3: Create clones for testing with production data

The Aurora fast cloning feature lets you create a new cluster that uses the same Aurora cluster volume and has the same data as the original. Because the newly created cluster shares the storage layer with the original cluster, there’s no need for snapshot/restore and as a result cloning is extremely fast regardless of the database size. You may create up to 15 clones on a cluster in the same Region. Aurora uses a copy-on-write protocol for managing the storage for the cloned cluster. When reads or writes are carried out against the clusters, this doesn’t lead to any change in data on the other clusters.

For example, let’s say that at the time of clone creation there are 10 rows in a database table. After the creation of the clone, an insert is invoked to add a new row in the original cluster. At this point a SQL SELECT count(*) FROM .. query against the original cluster will show 11 rows, whereas the same query against the cloned cluster will show 10 rows. The new row won’t be visible in the cloned cluster.

The Aurora storage architecture spreads the data in storage nodes across three Availability Zones. This makes it highly unlikely for the clusters sharing the storage volume to experience storage latency due to activity on other clusters. The following diagram illustrates this architecture.

Application developers often test their code against the production data. When a traditional database or Amazon Relational Database Service (Amazon RDS) database is in use, the only choice is to create a new cluster and copy the data from the original database to the test database. This process of standing up a test database requires resources and time. With the Aurora cloning feature, you can have the test database instance available within minutes. If production and development are using two different AWS accounts, then you may use the Aurora cross-account cloning feature that allows dev teams to create a clone of a production Aurora database cluster. This lets developers test their apps against the production data without impacting the production cluster.

In addition, cloning feature can be used for database engine upgrade. You can create a clone of existing database, upgrade the cloned database cluster to later version and test your application against it to validate the applications. If the testing is successful then you are ready to upgrade your database.

To try out cloning, complete the following steps:

On the Amazon RDS console, choose Databases in the navigation pane.
Select your Aurora cluster.
On the Actions menu, choose Create clone.
After you create the clone, SSH to your bastion host and connect to the clone cluster endpoint:

psql -h <<Paste Clone cluster endpoint>> -U <<DB User name>>

-- Create a test table (use different name if needed)
=> CREATE TABLE test(id int);
=> INSERT INTO test VALUES(1);
=> SELECT * FROM test;
=> \q

Now connect to the original cluster using psql:

psql -h <<Paste Clone cluster endpoint>> -U <<DB User name>>

=> SELECT * FROM test;

You will not find the test table.
When you’re done experimenting with the clone, you must delete the cloned cluster to avoid incurring charges for the running cluster.

On the Databases page, select the instance in the clone cluster and on the Actions menu, choose Delete.
Deselect the option to create a final snapshot.

Tip #4: Use a global database for disaster recovery and better performance

Amazon Aurora Global Database lets you create a global database cluster that consists of a single primary cluster and up to five secondary clusters across different Regions. The log record from the primary cluster is replicated at the storage level and doesn’t put any overhead on the writer node. This physical replication mechanism is highly performant when compared to the logical replication. SQL writes can be carried out against the writer node in the primary cluster only, but reads may be carried out against any endpoint on any cluster.

As an example, consider a global application deployed across three Regions that is using an Aurora global database, as shown in the following figure. Each of the application instances will send the read transactions to the local reader endpoint and write transactions to the writer endpoint in the primary cluster.

As a best practice, consider using a global database for applications that demand low Recovery Time Objective (RTO) and Recovery Point Objective (RPO). The replication lag for the secondary clusters is typically under 1 second. In case of an issue with Region availability, a secondary cluster in a different Region may be promoted to become primary. This Regional failover mechanism makes it possible to achieve an RTO in the order of single-digit minutes. A manually initiated failover is referred to as a managed planned failover. On very rare occasions, your Aurora global database might experience an unexpected outage in its primary Region. To recover from such unplanned outages, you need to promote a secondary cluster in a different Region to become the primary.

In addition, copies of the application deployed in the secondary Regions may utilize local reads from the co-located cluster, leading to better application performance from the read perspective. With this active-active setup, your applications in the secondary Region will send writes to the writer node in the primary cluster.

Amazon Aurora MySQL-Compatible Edition supports write-forwarding that lets you carry out writes against the secondary cluster. These write operations against the secondary cluster are automatically directed to the writer node in the primary cluster. In addition, you can also control the read consistency level on the secondary cluster. By default, the read consistency on secondary clusters is Eventual, but it may be set to Session or Global. If the read consistency is set to Global, then the reads against the secondary will show the latest committed data available on primary.

Occasionally, there’s a need to migrate an Aurora cluster from one Region to another. Aurora Global Database is a quick solution that you can use for these migrations. To achieve this, you add a secondary Region to the cluster. When the secondary Region is caught up with primary, you stop the writes on the primary, detach the secondary cluster from the global database, and delete the global database and the primary cluster.

Let’s try out Aurora Global Database:

On the Amazon RDS console, choose Subnet groups in the navigation pane.
Choose Create DB subnet group and create a subnet group.
Select a VPC for creating the DB cluster:
1. Provide a name.
2. Select at least three Availability Zones.
3. Select the subnets in the three Availability Zones.
On the Databases page, select the cluster and on the Actions menu, choose Add AWS Region.
Add the secondary Region to the existing cluster:
1. Provide the global DB identifier.
2. Select the secondary Region.
3. Select an instance of appropriate size (for example, db.t3.medium).
4. For the connectivity group:
  - Select the VPC.
  - Select the subnet group you created earlier.
  - Select an appropriate security group (deselect default).
5. Choose Add region.
In the secondary Region, set up a bastion host (Amazon EC2) instance to connect to the cluster.

Install the PostgreSQL client:

sudo amazon-linux-extras install postgresql13

SSH to the bastion host in the secondary Region and check out the data using SQL.
SSH to the bastion host in the primary Region and insert some data into test tables.
SSH to the bastion host in the secondary Region and check out the data using SQL.
Clean up the secondary cluster to avoid charges:
1. Remove the cluster from the global database.
2. Select the DB instance in the secondary cluster and delete it.
3. Deselect the option to create a final snapshot.
4. Delete the subnet group in the secondary Region.
5. Remove the primary cluster from the global database.
6. Delete the global database.

Tip #5: Get consistent application performance using Aurora Serverless v2

Aurora offers a special database instance type that scales automatically based on the resource pressure on the database instance. In a provisioned Aurora database cluster, the memory, vCPU, and network bandwidth are determined by the instance type. An Aurora Serverless v2 instance is a drop-in replacement of the provisioned database instance. To use Aurora Serverless v2 instances, you may select the instance type as Aurora Serverless v2 in a new cluster or add Aurora Serverless v2 instances to your existing cluster. The Aurora Serverless v2 capacity range is set as min/max at the cluster level. Aurora continuously monitors the instance resources, such as memory, CPU, and network. These measurements collectively are called load. Aurora Serverless scales up or down based on the load on the instance. In addition to scaling the resources on the instance, Aurora Serverless v2 dynamically adjusts the maximum available connection on the instance based on its current capacity.

You can utilize Aurora Serverless v2 instances for applications that have spiky usage—meaning that the load on database mostly stays low, but there are usage spikes that happen on an infrequent basis, for example hourly, daily, or even weekly. In that scenario, a provisioned instance sized for peak usage will mostly stay idle and may not be cost effective. Aurora Serverless v2 addresses this concern because it matches the demand curve to increase the cost efficiency of these spiky workloads.

The Aurora Serverless v2 allocated resources will stay close to the minimum specified capacity, and they will scale up (and down) in alignment with the resource usage on the instance. You pay only for the capacity that you use and as a result save on costs while maintaining a consistent user experience for your application.

The capacity range is set at a cluster level in terms of Aurora Capacity Units (ACUs). One ACU provisions 2 GiB of memory, the equivalent vCPU, and network bandwidth. You may create a cluster with all Aurora Serverless v2 instances or a mix of provisioned and serverless instances. The following diagram illustrates this architecture.

In this hands-on exercise, we convert the existing provisioned cluster to an Aurora Serverless v2 cluster (mixed configuration) by adding a read replica and then removing the provisioned instance. You can use this mechanism for converting an existing provisioned Aurora cluster to an Aurora Serverless v2 cluster. Then we run a load test against the cluster to observe the Aurora Serverless v2 instance scaling characteristics.

On the Amazon RDS console, choose Databases in the navigation pane.
Select your DB cluster and on the Actions menu, choose Add reader.
Provide the DB instance identifier.
In the instance configuration, choose the Serverless v2 instance.
Set the capacity range of minimum ACUs to 2 and maximum ACUs to 8.
Choose Add reader.
After you create the reader instance, delete the provisioned instance.

To understand the scaling behavior, we run the load against the cluster and observe the CloudWatch ServerlessDatabaseCapacity metric.

SSH to the bastion host and run the load against the cluster using the pgbench utility:

# Initialize pgbench database

psql -h <cluster-endpoint> -U <user name> -c “create database pgbench;”

# Create a temp database for the load test

pgbench -i -h <cluster-endpoint> -U <user name> pgbench

# Run the load test for 2 minutes with 100 connections

pgbench -c 100 -T 120 -P 2 -d pgbench -b simple-update -h <cluster-endpoint> -U <user name>

# Once the test has completed, you may drop the temp database

psql -h <cluster-endpoint> -U <user name> -c “drop database pgbench;”

In this pgbench run, we create 100 connections to the cluster endpoint and run a simple update benchmark test against it for 2 minutes. This results in the scaling of the Aurora Serverless v2 instance in the cluster.

On the CloudWatch console, navigate to the metric ServerlessDatabaseCapacity.

You’ll see a scale-up of instance capacity when the load test was run and a gradual capacity scale-down starting at the end of the load test.

To clean up, if you created a cluster for carrying out the exercises, delete it.

Conclusion

In this post, we shared five tips for making effective use of Aurora features. It’s suggested that apps segregate the read and write operations against the Aurora DB cluster to enhance performance and optimize the use of the DB resources. For faster Java app recovery from failover, consider using the AWS JDBC drivers, or use an Amazon RDS proxy (which we discuss in Part 2 of this series). With the Aurora fast cloning feature, app developers can easily test their applications against production data. Applications may be deployed across multiple Regions and use Aurora global database, providing disaster recovery, localized reads, and a Regional migration capability. For workloads that have spiky performance characteristics, applications can utilize Aurora Serverless v2 instances. Review your applications to see how you can utilize these capabilities.

In Part 2 of this series, we share five tips on how to enhance the performance, security, and availability of your apps by utilizing AWS services that directly integrate with or complement Aurora databases.

About the Author

Rajeev Sakhuja is a Solution Architect based out of New York City. He enjoys partnering with customers to solve complex business problems using AWS services. In his free time, he likes to hike, and create video courses on application architecture & emerging technologies; check him out on Udemy.