How can I resolve common issues when using read replicas in Amazon Aurora?

Last updated: 2021-02-01

I have an Amazon Aurora MySQL DB instance, and I am experiencing issues when working with read replicas. How can I troubleshoot common issues when using read replicas with Amazon Aurora?

Short description

Amazon Aurora MySQL supports Read Replicas that share a common underlying volume with a writer DB instance in same AWS Region. If you change your writer DB instance, the updates are visible to replica instances in the DB cluster. You can also create cross-Region MySQL Read Replicas. These are implemented using the MySQL binlog-based replication engine.

It's a best practice to use Aurora replicas when scaling read operations. You do this by reducing the read workload on the writer. Then, increase the availability to handle events that slow or block scaling.

Resolution

How do I promote an Aurora read replica?

Manual failover - Perform a manual failover to promote another read replica instance as a writer instance by following these steps:

  1. Sign in to the Amazon Relational Database Service (Amazon RDS) console.
  2. From the navigation panel, choose Databases.
  3. Choose the writer instance for your Aurora DB cluster.
  4. Choose Actions, and then choose Failover.

Automatic failover - Aurora automatically fails over to a read replica instance if the writer instance becomes unavailable. This can happen for a number of reasons, including resource contention and for maintenance activity. If you have multiple readers, you can give a promotion priority tier to each instance in your cluster. When the writer instance fails, Aurora promotes the replica with the highest priority as the new writer.

You can also promote a cross-Region Aurora replica as a standalone DB cluster. The cross-Region replication stops after you initiate the promotion process. The newly promoted cluster functions as an independent DB cluster, and handles both read and write operations.

How can I measure replication lag?

Because all Aurora DB instances in a single DB cluster share a common data volume, there is minimal replication lag. Usually, lag times are in the 10s of milliseconds. However, you might observe slightly increased lag on the readers in a few specific circumstances.

Note: Cross-Region replicas use logical replication, and are influenced by the change/apply rate and delays in network communication between the specific Regions selected. Cross-Region replicas using Aurora databases have a typical lag of under a second.

You can measure replication lag using the following Amazon CloudWatch metrics:

  • Use AuroraReplicaLag to measure replica lag between the writer and reader node in milliseconds (same Region).
  • Use AuroraBinlogReplicaLag to measure replica lag between Aurora DB clusters using binary logs.

How can I improve replication performance?

Follow these recommendations to improve replica lag:

  • If the reader instance is smaller than the writer instance, the volume of changes might be too much for the reader to catch up. It's a best practice that all instances in a cluster are the same size to avoid any workload overload on the reader instances.
  • If there is heavy workload on the writer, you might notice temporary read replica lag. The lag reduces after the reader instance catches up with the writer instance.
  • If there are any long-running transactions in progress, you might observe a replica lag on the readers. To avoid replica lag, run your transactions in smaller batches and run commits more frequently.

For more information on troubleshooting high replica lag using native binlog-based MySQL replication, see Overview of backing up and restoring an Aurora DB cluster.

Can I use a global transaction identifier (GTID)?

A global transaction identifier is a unique string that is associated to a transaction on its Commit. A GTID is unique across all servers, and changes are applied on target, based on the GTID value. For more information, refer to the MySQL documentation for GTID concepts.

Aurora doesn't use native binlog replication to replicate data to read replica instances. It's not possible to use GTID to replicate data between instances in the same cluster. However, you can set up GTID-based replication in certain scenarios. For more information on using GTID-based replication in Aurora MySQL, see the AWS Blog on GTID.

Note: You can set up GTID-based replication between an Amazon RDS MySQL and an Aurora cluster, and between Aurora Clusters (by assuming the source is an external master). Make sure to enable binlog on the source before you start the replication process.


Did this article help?


Do you need billing or technical support?