AWS Database Blog

Best practices for using a MySQL read replica to upgrade an Amazon RDS for MySQL database

The following common technique is used to perform maintenance on a MySQL database:

  1. Create a read replica of the database.
  2. Perform maintenance on the read replica.
  3. Promote the read replica to replace the original database.

This post walks you through some best practices for performing a major version upgrade on an Amazon RDS for MySQL database using this technique, and other optional steps you can take.

There are a number of edge cases that must be called out prior to upgrading from 5.7 to 8.0. For more information about these edge cases, see Prechecks for upgrades from MySQL 5.7 to 8.0.

This post focuses on the best practices for planning the upgrade and cutting over to the new DB instance.

Solution overview

The technique described in this post enables you to test the upgrade on a non-critical DB instance prior to taking any downtime on your production DB instance. You can fully test and validate that the upgrade is successful during the planning phase of the maintenance. It’s important to set up the new environment completely ahead of time. The idea is to minimize the steps to perform during the maintenance outage window to reduce downtime.

For our use case, let’s assume our production environment has the following:

  • A Multi-AZ deployment
  • Backups enabled
  • One or more read replicas for read scaling

The following diagram illustrates this architecture.

The following diagram illustrates this architecture.

The steps that follow don’t require strict ordering except where specified; the basic goal is to set up the new DB instance prior to any cutover. We present them in the following order:

  1. Create a new read replica.
  2. Convert the target DB instance into a Multi-AZ deployment and enable backups.
  3. Upgrade the target instance.
  4. (Optional) Create additional read replicas.
  5. (Optional) Create custom parameter groups.
  6. (Optional) Address replica lag by enabling multi-threaded replication.
  7. Perform the cutover.

Creating a new read replica

This new read replica will ultimately be the DB instance that becomes the new production instance. We refer to this new DB instance the target DB instance, as in the following diagram.

We refer to this new DB instance the target DB instance, as in the following diagram.

Converting the target DB instance into a Multi-AZ deployment and enabling backups

By default, a new read replica is created as a single-AZ deployment with backups disabled. Because the target DB instance ultimately becomes the production DB instance, it’s a best practice to configure a Multi-AZ deployment and enable backups now.

You must restart the read replica to enable binary logging, so we can take this outage now when there is no impact.

Similarly, converting to a Multi-AZ deployment requires creating a backup of the target DB instance, then restoring this backup to a secondary DB instance running in a different Availability Zone. When converting to a Multi-AZ deployment, there might be an impact on write latency on the target DB instance during the short period when the new secondary copy is synchronizing with the primary copy of the target DB instance. Again, taking this step ahead of time avoids performance impact later.

The following diagram illustrates this architecture.

The following diagram illustrates this architecture. 

Upgrading the target instance

In the rare event that there is an incompatibility when upgrading to MySQL version 8.0, testing the upgrade process on a read replica protects the original production DB instance. There is no commitment to roll forward until you’re absolutely sure that the upgrade is completed successfully without issues. You can test the upgrade multiple times.

Alternatively, you can restore a snapshot of the original production DB instance to a scratch DB instance to test the upgrade process on the scratch DB instance. This extra validation is simple and easy, so is strongly recommended.

The following diagram illustrates this architecture.

The following diagram illustrates this architecture.

Creating additional read replicas

As an optional step, if you require read replicas for read scaling or for any other reason, now is a good time to create any read replicas that are required in your final configuration.

The following diagram illustrates the updated architecture.

The following diagram illustrates the updated architecture.

Creating custom parameter groups

Optionally, if you require any custom parameter groups, now is a good time to apply these to the target DB instance as well as any read replicas.

Addressing replica lag by enabling multi-threaded replication

MySQL binlog replication by default is single threaded on the read replica. As such, a highly parallel concurrent workload on the source DB instance can result in replica lag on the read replica.

When planning for the outage to cut over to the target DB instance, you must stop the application’s workload (an outage), wait for replica lag to reach zero, and then proceed with the cutover. To minimize the impact, consider multi-threaded replication on any read replicas, and certainly on the target DB instance. Multi-threaded replication was added in MySQL version 5.7, and can greatly reduce any replica lag.

Three parameters enable multi-threaded replication. These settings work for most configurations:

  • slave_parallel_workers=8
  • slave_parallel_type=LOGICAL_CLOCK
  • slave_preserve_commit_order=1

For illustration, assume there are two replicas with a common source database. The orange line in the following metrics diagram is using default parameters, and the blue line is using the preceding parameter setting. In this case, there is zero replica lag for the test workload. Keep in mind that workloads vary, and you still need to manage replica lag.

The orange line in the following metrics diagram is using default parameters, and the blue line is using the preceding parameter setting.

Your target environment is now ready. You can take an outage and promote the target DB instance to replace the original production DB instance.

Typically, everything to this point can be accomplished days in advance so that you have some run time with the new environment, and you can minimize the changes during the maintenance window to the bare minimum.

Performing the cutover

Now that the target environment is ready, you can plan for the maintenance window to cut over to the target instance.

  1. Begin the outage by stopping traffic from the application.
  2. Use the AWS Command Line Interface (AWS CLI) or the AWS Management Console to promote the target read replica.

This step disconnects the read replica from the original production DB instance and makes the target a standalone DB instance (see the following diagram).

This step disconnects the read replica from the original production DB instance and makes the target a standalone DB instance

  1. As an optional step, if you want to maintain the same DNS endpoint for your database, you can swap database endpoints between the original production DB instance and the target DB instance:
    1. Call the modify-db-instance CLI command to rename the original production DB instance.
    2. Call the modify-db-instance CLI command again to rename the target DB instance to assume the original database identifier.
  2. When the renaming is complete, you can end the outage and restart the application, which resolves to the new production instance.

Conclusion

It may seem like a lot of steps, but the main advice in this post is that you can perform almost all the maintenance steps necessary to upgrade your DB instance ahead of time. Therefore, you can reduce the number of steps during the maintenance outage to a bare minimum.


About the Author

Phil Intihar is a Principal Database Engineer at Amazon Web Services.