Implementing run-time write synchronization in Amazon Aurora Global Database

Amazon Aurora is a MySQL and PostgreSQL-compatible relational database built for the cloud, that combines the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open-source databases. Global Database is an Aurora feature designed for globally distributed applications, allowing a single Amazon Aurora database to span multiple AWS Regions. A Global Database contains one primary Region that supports read/write operations, and up to five secondary Regions that are read-only. It replicates your data asynchronously without impact on database performance, and enables fast local reads with low latency in each Region. It also provides managed failover and switchover functionality for planned region migrations or disaster recovery from regional service impairments.

Amazon Aurora supports advanced capabilities for global replication:

In the PostgreSQL-compatible edition, you can use managed recovery point objective (managed RPO) configuration to ensure that at least one secondary cluster stays within the target RPO window.
The MySQL-compatible edition supports configurable read consistency modes as part of its write forwarding feature. The feature facilitates read workloads that must see the most recent data, and can’t tolerate the cross-Region replication lag. It can also forward write statements from read replicas to the primary instance.

Note that the write forwarding feature is supported in multi-region Global Database setups as well as single-Region setups without Global Database. This post focuses exclusively on multi-Region scenarios. See the Local write forwarding blog post for more information about single-region use cases.

Use cases for run-time replication tracking in Aurora Global Database

Some use cases may require additional measures for quickly verifying the status of global replication at run time. Your global application might be performing writes in the primary Region, and you might need to verify that these writes made it to some or all of the secondary Regions before unblocking the rest of the application flow.

For example, you might have a global suite of microservices exposing APIs for “write” and “read” operations. Your developers can build workflows by calling those APIs in various order. You want to be able to scale “read” APIs through secondary Regions, but you don’t want developers to worry about global replication lag breaking causality assumptions between calls. Your APIs are stateless, so you can’t simply “pin” a particular developer session to a single Region.

In this situation, you could:

Consider the managed RPO feature. However, RPO is a cluster-level parameter designed for configuring the recovery point objective of the entire data set. It would not be ideal for checking replication status at run time, and the feature is also not supported in the MySQL-compatible edition of Aurora.
Use the GLOBAL consistency mode for any API calls handled in the secondary Regions. This could be implemented with minor code changes, but would also add hundreds of milliseconds of latency to every read, whether or not it was preceded by a write. At the time of this writing, the read-after-write consistency modes are available only in the MySQL-compatible edition of Aurora.
Implement a time-based wait in the write path, so that each “write” API waits long enough for its effects to be replicated. This is an engine-agnostic approach that could work in both MySQL and PostgreSQL editions of Aurora, but it would also present significant challenges around correctness (how long to wait?) and performance (avoid waiting too long).

In this post, I present a novel solution for use cases that require replication verification in the write path, using the Aurora Global Database metadata and a couple of stored procedures. The solution provides your database clients with the ability to verify whether the most recent changes (writes) made in the current primary Region have been received by secondary Regions. Verification can be done at run time, and without enabling additional Aurora features or modifying the cluster-wide configuration.

The premise of the solution is compatible with both Aurora MySQL and Aurora PostgreSQL. This blog post uses Aurora MySQL for the stored procedure code and demonstrations, but you can apply the same logic to implement the solution in Aurora PostgreSQL.

Note that the solution provides a way to verify replication progress, but it doesn’t guarantee successful replication. It doesn’t change the RPO posture of your global clusters, and doesn’t provide a path to “zero RPO” or “zero data loss”.

Solution overview

Aurora Global Database replication is based on a transaction log that records all data changes made in the cluster. The log stream is generated in the primary Region, and is replicated to all secondary Regions. The log is linearly ordered by a Log Sequence Number (LSN), and the secondary Regions apply changes in the same order as they were made in the primary Region. Thanks to the LSN-ordering of changes, you can use the LSN as a source of truth for tracking the “replication position” of each secondary Region relative to the “write position” of the primary Region. Tracking and comparing those LSN positions is the basis of the write synchronization solution.

Solution implementation is made possible by the Global Database LSN metadata available in both MySQL-compatible and PostgreSQL-compatible editions of Amazon Aurora:

For Aurora MySQL, the metadata can be found in the aurora_global_db_status system view supported in versions 3.04.0 and above.
For Aurora PostgreSQL, the same information is provided by the aurora_global_db_status function supported in all versions.

We use stored procedures to encode logic that checks the current LSN of the primary Region and waits until the desired condition is met in the secondary Region(s). There are two stored procedures to cover two distinct usage scenarios:

Wait for a given number (quorum) of secondary Regions to reach the primary’s LSN. This is relevant when you want to ensure that the data was replicated globally at least “N” times.
Wait for a named secondary Region to reach the primary’s LSN. This is useful if your global topology contains multiple secondary Regions, and you’d like to verify that the most recent changes made it to a specific secondary Region (not just “any” Region).

Once the condition is met, the procedure returns, and the SQL session can continue. The code supports simple timeout functionality, input validation that can be easily adjusted, and basic checks for erroneous/impossible conditions.

The following diagram shows the end-to-end flow of a client session using the solution:

The client performs writes in the primary Region. These writes must be replicated to the secondary Regions before the rest of the application flow can run successfully.
The client calls the stored procedure, and the client session waits until the procedure returns.
The stored procedure logic verifies that the desired condition was met (for example, a particular secondary Region caught up to the point after the writes were made).
The procedure returns control to the client session. If the procedure executed without errors, the client can assume that the changes made it to the secondary Region(s).

Walkthrough

This section describes the process of implementing the solution on Amazon Aurora MySQL-compatible edition. You can follow the same overall logic in Aurora PostgreSQL by adjusting the code for compatibility with PostgreSQL stored program syntax.

Prerequisites

To implement the solution, you will need:

An AWS account.
An Aurora MySQL Global Database cluster spanning at least two Regions: one primary Region, and at least one secondary Region. Use Aurora MySQL version 3.04.0 or newer.
A client environment with MySQL client software. This demonstration uses an Amazon Elastic Compute Cloud (EC2) instance with a mysql command-line client.
Connectivity between the client environment and the Aurora instances on the MySQL.
Basic understanding of MySQL stored program language.

Implementation

As noted earlier, the solution uses two stored procedures to support two use case scenarios:

Waiting for a quorum of Regions, implemented in procedure aurora_global_wait_quorum.
Waiting for a specific Region identified by name, implemented in aurora_global_wait_region.

You can download the procedure code using the following links:

In the next two sections, let’s walk through the key parts of the logic for each stored procedure. If you’re well familiar with MySQL stored program language, you can skip these sections and read the code files instead.

Procedure aurora_global_wait_quorum

This procedure is designed to wait for a given number of secondary Regions to reach the LSN of the primary Region. Let’s take a look at the code:

CREATE PROCEDURE aurora_global_wait_quorum(wait_quorum_size INT, wait_interval_ms INT, wait_timeout_s INT, debug BOOL)

The procedure takes four input parameters:

wait_quorum_size: The number of secondary Regions to wait for.
wait_interval_ms: The number of milliseconds between metadata checks. In other words, when the procedure runs a loop to check replication metadata, this is the sleep time between loop iterations.
wait_timeout_s: Time limit for the metadata check loop. The procedure will return an error if it fails to reach the desired condition within the timeout.
debug: Whether or not the procedure should return informational messages throughout execution.

Next, the code performs basic sanity checks on the input parameters:

-- Ensure the procedure can only be run on the writer instance in the primary region.
IF @@innodb_read_only = 1 THEN
    SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = "This procedure can only run on the writer instance";
END IF;

-- Input check: ensure we're waiting for at least one secondary region
IF wait_quorum_size < 1 THEN
    SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = "Minimum quorum size is 1";
END IF;

-- Input check: prevent callers from checking replication status too frequently
IF wait_interval_ms < 100 THEN
    SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = "Minimum wait interval is 100 milliseconds";
END IF;
   
-- Input check: ensure a reasonable minimum for the wait timeout
IF wait_timeout_s < 1 THEN
    SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = "Minimum wait timeout is 1 second";
END IF;

The logic begins with checking the current LSN of the primary Region, which is the write point that you want your secondary Regions to reach. You can identify the primary Region by the “zero” value in its replication lag timestamp field:

SELECT aws_region, highest_lsn_written 
FROM information_schema.aurora_global_db_status 
WHERE last_lag_calculation_timestamp = FROM_UNIXTIME(0) 
INTO v_primary_name, v_primary_lsn;

In the next step, the code checks the total number of secondary Regions in the global topology and throws an error if the requested quorum size (number of Regions to wait for) is larger than the actual number of secondary Regions in the cluster:

SELECT COUNT(*) FROM information_schema.aurora_global_db_status 
WHERE aws_region <> v_primary_name INTO v_secondary_count_all;

IF v_secondary_count_all < wait_quorum_size THEN
    SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = "Requested quorum size is higher than the total number of secondary regions.";
END IF;

For the main part of the implementation, global replication status is checked in a loop, at the configured interval. The loop ends successfully when the quorum is met, or it will throw an error if it fails to meet the quorum within the timeout:

-- Set the start timestamp for the timeout timer
SELECT NOW(3) INTO v_wait_start_timestamp;

-- Repeat as long as the number of caught-up secondary regions is lesser than the quorum size
WHILE v_secondary_count_quorum < wait_quorum_size DO

    -- Exit if the wait timeout was exceeded
    IF TIMESTAMPDIFF(SECOND, v_wait_start_timestamp, NOW(3)) > wait_timeout_s THEN
        SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = "Quorum wait timeout exceeded";
    END IF;

    -- Sleep for "wait_interval_ms" milliseconds before checking replication status
    SELECT SLEEP(wait_interval_ms/1000) INTO v_wait_sleep_output;
    SET v_wait_sleep_count = v_wait_sleep_count + 1;

    -- Get the number of caught-up secondary regions
    SELECT count(*) 
    FROM information_schema.aurora_global_db_status
    WHERE highest_lsn_written >= v_primary_lsn AND aws_region <> v_primary_name
    INTO v_secondary_count_quorum;

END WHILE;

Implementation notes:

The procedure returns a “Query OK” response for a successful execution, indicating that the wait condition was met. The output will be empty unless you enabled the debug messages.
The procedure uses SIGNAL statements with an “unhandled exception” error code to indicate error conditions. This makes it easy for applications to detect errors, compared to other mechanisms such as warning codes or SELECT “Something” AS ErrorMessage. If the procedure returns an error (or anything other than “Query OK”), you should assume that the LSN wait was not successful.
When using the procedure in tightly controlled environments, you can make the implementation more efficient by removing unnecessary checks. For example, you could remove the check comparing the requested quorum size with the actual number of secondary Regions. This will save one metadata query, but you will experience a loop timeout whenever you request a quorum larger than the number of secondary Regions.
You might notice that the loop executes a sleep function first, before checking the global replication status. This improves efficiency when the procedure is called immediately after the most recent write statement. The global replication will need a brief moment to catch up, so we might as well sleep first. You could change the order of operations to “check first, sleep later” if your use case typically involves a delay between the most recent write statement and the procedure execution.

Procedure aurora_global_wait_region

This procedure is designed to wait for a specific Region to reach the LSN of the primary Region. The code is quite similar to what was discussed above, so let’s focus on the key differences.

CREATE PROCEDURE aurora_global_wait_region(wait_region_name VARCHAR(20), wait_interval_ms INT, wait_timeout_s INT, debug BOOL)

The procedure takes three input parameters. The wait_region_name parameter is the name of the Region to wait for, and the other two parameters were discussed in the preceding section. Use the API notation for the Region name, for example “us-east-1” or “eu-west-1”.

Just like previously, the logic starts by learning the primary Region name and its current LSN. If the requested Region (one to wait for) is the same as the primary Region, the code returns immediately since there’s no need to wait.

SELECT aws_region, highest_lsn_written 
FROM information_schema.aurora_global_db_status 
WHERE last_lag_calculation_timestamp = FROM_UNIXTIME(0) 
INTO v_primary_name, v_primary_lsn;

-- Return immediately if the region we wait for is the same as the primary region
IF v_primary_name = wait_region_name THEN
    LEAVE procedure_block;
END IF;

The loop itself looks similar, but instead of counting the number of Regions, it checks the LSN for a specific Region:

-- Repeat until the secondary region LSN catches up to the primary region LSN
WHILE v_secondary_lsn IS NULL OR v_secondary_lsn < v_primary_lsn DO

    -- Exit if the wait timeout was exceeded
    IF TIMESTAMPDIFF(SECOND, v_wait_start_timestamp, NOW(3)) > wait_timeout_s THEN
        SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = "Quorum wait timeout exceeded";
    END IF;

    -- Sleep for "wait_interval_ms" milliseconds before checking replication status
    SELECT SLEEP(wait_interval_ms/1000) INTO v_wait_sleep_output;
    SET v_wait_sleep_count = v_wait_sleep_count + 1;

    -- Get the LSN for the secondary region we're waiting for
    SELECT highest_lsn_written 
    FROM information_schema.aurora_global_db_status 
    WHERE aws_region = wait_region_name 
    INTO v_secondary_lsn;

    -- Check if the LSN information was returned. If not, it means the region doesn't exist in the topology
    IF v_secondary_lsn IS NULL THEN
        SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = "Requested secondary region not found in global topology";
    END IF;

END WHILE;

Implementation notes:

If the requested Region (one to wait for) is the same as the primary Region, it’s not considered to be an error condition, but the code simply returns immediately. If this implementation choice doesn’t work for your use case, you can replace the LEAVE statement with a SIGNAL statement to throw an error.
You might notice that the procedure code isn’t validating the name of the requested Region until it enters the loop. This is a performance optimization to avoid running a separate validation query for every procedure execution. If the requested Region isn’t part of the global topology, the check query within the loop will return an empty result, and the error will be thrown at that time.

Deployment and usage

In order to deploy the stored procedures, connect to the Aurora writer instance in the primary Region, and run the CREATE PROCEDURE statements provided in the code files. Note that stored procedures need to be created in a specific schema, so you must either prefix the procedure name with a schema name (such as, “CREATE PROCEDURE schema_name.aurora_global_wait_region”) or navigate to a schema before creating the procedure (such as, “USE schema_name” in a mysql command line client).

You can invoke the stored procedures using the CALL statement. Let’s review a few examples below. The samples were executed on a global cluster with a primary Region of “us-west-2” and one secondary in “us-east-1”.

Example 1: Wait for at least one Region to catch up to the primary Region’s LSN. Check replication metadata every 100ms with a timeout of 10s, debug output enabled.

mysql> CALL aurora_global_wait_quorum(1, 100, 10, true);
+----------------------------------------------+
| Message                                      |
+----------------------------------------------+
| Primary region is us-west-2, at LSN 29833307 |
+----------------------------------------------+
1 row in set (0.20 sec)

+----------------------------------------------------------------+
| Message                                                        |
+----------------------------------------------------------------+
| Total number of secondary regions: 1, requested quorum size: 1 |
+----------------------------------------------------------------+
1 row in set (0.20 sec)

+--------------------------------------------------------------------------------------------------+
| Message                                                                                          |
+--------------------------------------------------------------------------------------------------+
| Waiting for 1 secondary region(s) to reach LSN 29833307, checking every 100ms, wait timeout: 10s |
+--------------------------------------------------------------------------------------------------+
1 row in set (0.20 sec)

+----------------------------------------+
| Message                                |
+----------------------------------------+
| Quorum reached after 2 wait iterations |
+----------------------------------------+
1 row in set (0.20 sec)

Query OK, 0 rows affected (0.20 sec)

The last output line indicates a successful execution with a total duration of 0.2 seconds.

Example 2: Same operation, but with debug output disabled. For brevity, debug outputs will be disabled for subsequent examples:

mysql> CALL aurora_global_wait_quorum(1, 100, 10, false);
Query OK, 1 row affected (0.30 sec)

Example 3: Try waiting for at least 3 Regions. This won’t be possible, since there’s only one secondary Region in this topology:

mysql> CALL aurora_global_wait_quorum(3, 100, 10, false);
ERROR 1644 (45000): Requested quorum size is higher than the total number of secondary regions.

Example 4: Wait for a specific secondary Region:

mysql> CALL aurora_global_wait_region('us-east-1', 100, 10, false);
Query OK, 1 row affected (0.30 sec)

Example 5: Wait for a Region that’s the current primary. There’s no error, but you’ll notice that the procedure execution time is effectively zero since it didn’t have to wait:

mysql> CALL aurora_global_wait_region('us-west-2', 100, 10, false);
Query OK, 1 row affected (0.00 sec)

Example 6: Try a Region that’s not present in the global topology. Note that the code doesn’t make a distinction between a Region name that’s not a member of the topology, and a Region name that’s invalid (for example, “no-such-region-1”):

mysql> CALL aurora_global_wait_region('eu-west-1', 100, 10, false);
ERROR 1644 (45000): Requested secondary region not found in global topology

mysql> CALL aurora_global_wait_region('no-such-region-1', 100, 10, false);
ERROR 1644 (45000): Requested secondary region not found in global topology

Additional usage notes

Note that the wait point (LSN) is established at the beginning of the stored procedure, which is after any prior write statements. The procedure doesn’t “look back” at the LSN of the most recent write statement. Transaction log records and Log Sequence Numbers are totally ordered, so this approach is valid from durability perspective: if the “current” LSN made it across Regions, then any prior changes also made it.
The procedures operate at the LSN level, not the transaction level. Calling a stored procedure within an uncommitted transaction will not force (or wait for) the transaction to appear in secondary Region(s), because the transaction could still be rolled back. However, calling the procedure immediately after committing a transaction does ensure that the transaction made it through. To avoid confusion, do not call the stored procedures within uncommitted transactions.
Calling the stored procedures has no impact on the Multi-Version Concurrency Control (MVCC) or ACID properties of the database. For example, if there’s a read-only transaction open in a secondary Region, and that transaction already established its read view, then calling the stored procedures won’t suddenly make new data to become visible within that read view.
For best results, use the stored procedures in an on-demand fashion where a synchronous wait for global replication is specifically needed. Although the stored procedures only involve a small number of metadata SQL statements, and those statements are relatively lightweight, the resource burden of running those statements can add up if there are thousands of procedure executions each second.

When choosing the sleep interval and timeout settings, consider your use case requirements as well as the typical replication lag observed in the cluster:

The sleep interval setting determines the responsiveness, but also the efficiency of the solution. The lower the interval, the more frequently replication status will be checked. When optimizing for efficiency, you can start with an interval that allows the procedures to return within 2 iterations.
The timeout interval should be used as a safety mechanism to prevent sessions from blocking indefinitely. Choose a timeout setting higher than the typical lag observed in the cluster, but not excessively high to prevent applications from becoming “stuck” waiting for synchronization. Aurora Global Database’s replication lag is typically under one second, so a timeout of 10 seconds should be sufficient in most scenarios. You can always choose to re-try, but you should be prepared to handle it as an error condition if synchronization can’t be achieved despite re-trying.

Using the procedures in secondary Regions

The implementation presented above prevents the stored procedures from being executed on read-only instances (replicas), which includes any instances in the secondary Regions of your Global Databases. Let’s explain this design choice.

The solution works by waiting for secondary Regions to reach the primary’s LSN position. An accurate view of the primary’s LSN is therefore a key requirement. The underlying metadata can be read on any instance in the cluster, so you could modify the logic to allow procedure execution anywhere in the global topology. However, reading the metadata is a passive operation, that is, it doesn’t actively query the global topology but instead uses information exchanged via the replication mechanisms. With that, information about the primary’s LSN position is only current when queried on the writer instance in the primary Region itself. If queried anywhere else, the primary’s LSN information is behind current due to the replication lag. As a result, using the procedures on a replica in a secondary Region would cause the database session to wait for the most recent LSN that the given replica is aware of, not for the primary’s actual current LSN.

If you’re looking to implement wait mechanisms in secondary Regions, use the Aurora Global Database write forwarding feature instead. The write forwarding feature includes configurable read-after-write (RAW) consistency modes, which are also based on LSN waits, but they are far more intuitive in the context of usage in secondary Regions. At the time of this writing, the read-after-write consistency modes are available only in the MySQL-compatible edition of Aurora.

Availability and failure mode considerations

When using cross-Region synchronization for reads or writes, give careful consideration to the failure modes of the global cluster, and the resulting effects on application behavior and availability. This applies to any synchronization mechanism, from the write-path synchronization solution described in this blog post to Aurora MySQL’s built-in read-after-write consistency feature. Aurora Global Database replication is asynchronous by nature, and the individual regional clusters (members of the global topology) are decoupled in terms of availability. However, applications that rely on synchronization solutions might be tempted to make assumptions that won’t always hold true. For example, if you use write-path synchronization to wait for 2 secondary Regions, you might be tempted to assume that there will always be at least 2 healthy secondary Regions in the global cluster. If an application is implemented under such an assumption, it will keep failing/retrying and thus won’t be able to gracefully handle availability events affecting secondary Regions.

When implementing cross-Region synchronization, always take availability aspects into account and ensure that your applications can handle error conditions gracefully and deterministically. As a best practice, only use synchronization when absolutely necessary and test your failure modes extensively, including the worst-case scenarios (for example, removing secondary Regions from the topology).

Cleaning up

If you no longer have a need for the stored procedures in your Aurora cluster, use DROP PROCEDURE statements to remove them. Remember to delete any test resources created to follow the instructions in this blog post (Aurora clusters and instances, client instances).

Conclusion

In this post, we presented a lightweight, run-time approach for verifying write replication in Amazon Aurora Global Database clusters. Equipped with this solution, you can adopt Aurora Global Database for workloads or even specific queries/transactions that were previously challenging to implement due to cross-Region write synchronization requirements.

Learn more about Aurora Global Database in the Amazon Aurora User Guide.

About the author

Szymon Komendera is a Database Solutions Architect at AWS, with 20 years of experience in databases, software development, and application availability. He spent the majority of his 9-year tenure at AWS as a member of Aurora MySQL and RDS MySQL engineering teams. He’s a Support person at heart, with a passion for helping and teaching.