Recover from a disaster with delayed replication in Amazon RDS for MySQL

July 2023: This post was reviewed for accuracy.

Amazon RDS for MySQL now supports a delayed replication, which allows you to set a time period that a replica database lags behind a source database. In a standard MySQL replication configuration, there is minimal delay between the source and the replica. Now you have the option to introduce an intentional delay.

A delay is very helpful when you need to recover from a human error. For example, if someone accidentally drops a table from your primary database, that accident doesn’t have to be saved in a replica. Now, you can stop the replication just before the point at which the table was dropped and promote the replica to become a standalone instance. In this blog post, I walk you through how to use delayed replication to save yourself from just such a scenario.

The following diagram illustrates how you can use a replica with a delay set to 3600 (one hour) to recover from a human error. First, stop replication; second, discover the error in the logs; next, replay the transactions up to the mistake; and last, promote the replica to the master. Solution overview

Prerequisites

Before you get started checking out delayed replication, you need an Amazon RDS for MySQL source database instance running MySQL 5.6.40 or 5.7.22 or later. You also need a MySQL client to connect to your instances and the proper security groups that allow you access to your database.

Make sure you’re retaining your binary logs for a sufficient amount of time. For more information about binary logs, see MySQL Binary Logs. The following example command shows setting the retention value to 24 hours.

call mysql.rds_set_configuration('binlog retention hours', 24);

Setting up the scenario

Use your own existing Amazon RDS for MySQL database with an existing read replica, or create a new database. For this blog post, I use an existing RDS for MySQL database and create a new read replica.

Create a database

If you don’t already have an Amazon RDS for MySQL instance, create one. Be sure to configure the database with a security group that allows access from your client machine. If you already have a MySQL database that you want to work with, you can skip this step.

Create an RDS for MySQL database by using the AWS Management Console, AWS CLI, AWS SDK, or an AWS CloudFormation template. If you need help creating a MySQL instance, follow the steps in Create and Connect to a MySQL Database with Amazon RDS. The following screenshot shows one database instance already set up and ready to use. Screenshot of a database instance already setup and ready to use

Connect to the database

After your master database instance is created and available, connect to it. If you’re using an Amazon EC2 Linux machine, you might want to set up some environment variables to save some extra typing, as shown in the following commands.

export REGION="us-west-2"
export DBUSER="master"
export RDSMASTER="blogger-mm.us-west-2.rds.amazonaws.com"
export DBNAME="mydb"

Connect to MySQL using the MySQL command line tool and variables, as follows.

mysql -h ${RDSMASTER%%:*} -P 3306 --user=$DBUSER --password=$DBPASSWORD

After you’re connected, create some test tables and some data. As a demonstration, I added a few tables to a schema called myschema. I added the following code.

SELECT table_name, table_rows, table_type, engine FROM INFORMATION_SCHEMA.TABLES
WHERE table_schema = 'myschema';

+---------------+------------+------------+--------+
| table_name    | table_rows | table_type | engine |
+---------------+------------+------------+--------+
| mytesttable   |   24534685 | BASE TABLE | InnoDB |
| mytesttable01 |      10000 | BASE TABLE | InnoDB |
| mytesttable02 |       9928 | BASE TABLE | InnoDB |
| mytesttable11 |       9928 | BASE TABLE | InnoDB |
| mytesttable12 |       9928 | BASE TABLE | InnoDB |
| mytesttable13 |       9928 | BASE TABLE | InnoDB |
| mytesttable14 |       9928 | BASE TABLE | InnoDB |
| mytesttable15 |       9676 | BASE TABLE | InnoDB |
| mytesttable16 |       9676 | BASE TABLE | InnoDB |
| mytesttable17 |      10000 | BASE TABLE | InnoDB |
| mytesttable18 |      10000 | BASE TABLE | InnoDB |
| mytesttable19 |      10000 | BASE TABLE | InnoDB |
+---------------+------------+------------+--------+
12 rows in set (0.00 sec)

Create the read replica

Add a read replica for disaster recovery. Use the AWS CLI or the AWS Management Console to add your read replica. The replica can be in the same Availability Zone, a different Availability Zone, or a different AWS Region. Here is how I create a read replica.

aws rds create-db-instance-read-replica --db-instance-identifier blogger-mm-rr \
    --source-db-instance-identifier blogger-mm --db-instance-class db.m4.large \
    --port 3306 --auto-minor-version-upgrade --multi-az --publicly-accessible

After your replica is shown as available, configure a replica delay for it behind your master instance.

Configure the replica delay

You configure replication delay on the source instance either before or after a read replica is created. With delayed replication, you specify the minimum amount of time, in seconds, to delay replication from the source to the read-replica. To configure the replication delay, connect to the replica and call the stored procedure mysql.rds_set_source_delay with a parameter value between 0 and 86400. The maximum value is 1 day (86400 seconds). We recommend that you set the value to at least 3600 (1 hour) to allow enough time to diagnose and recover from an erroneous SQL statement. Here is how I set this:

call mysql.rds_set_source_delay (3600);

+-----------------------------------+
| Message                           |
+-----------------------------------+
| source delay is set successfully. |
+-----------------------------------+
1 row in set (0.02 sec)

Recovering from a dropped table

In my example, 3 hours after creating the replica, I dropped the table mytesttable11 when I actually meant to drop mytesttable01. Because the replica is delayed by one hour, I have time to figure out what happened and keep the mistake from propagating to the replica. I realized my mistake immediately, so I took the following recovery steps.

Connect to the read replica blogger-mm-rr and called the stored procedure rds_stop_replication.
```
call mysql.rds_stop_replication;

+---------------------------+
| Message                   |
+---------------------------+
| Slave is down or disabled |
+---------------------------+
1 row in set (1.01 sec)
```
If you can stop replication when you first realize an issue, then you can find the logs with the mistake and take action to prevent them from getting replayed on the replica.
Find the binlog location of the error using mysqlbinlog utility. For more information about mysqlbinlog with RDS, see Log Access Concepts.
Set your replication delay to zero so that the replication from this point happens as quickly as possible. To do this, run rds_set_source_delay(0);

After you have found the log and the location that contains your change, make a note of the log name and the location. My error occurred just after location 120 in file mysql-bin-changelog.000263. To start replication until it reaches your location in your binary log file, use the procedure call.rds_start_replication_until. Warning – Any changes that were made after the location specified in the recovery have to be manually reapplied or are permanently lost. Here is how I did this:

call mysql.rds_start_replication_until('mysql-bin-changelog.000263', 120);

+---------------------------------------------------------------------------------------------+
| Message                                                                                     |
+---------------------------------------------------------------------------------------------+
| Slave started until MASTER_LOG_FILE = "mysql-bin-changelog.000263'" and MASTER_LOG_POS = 120 |
+---------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Wait for the RDS Event “Replication has been stopped since the replica reached the stop point specified by the rds_start_replication_until stored procedure. Use rds_start_replication to resume replication.” Connect to the replica and verify that all changes have been applied by checking if the Relay_Master_Log_File and Exec_Master_Log_Pos are at least the positions that you passed. Here is how I checked this:
```
show slave status;
```
Promote the replica after it has all the changes prior to the dropped table. Here is how I promoted:
```
aws rds promote-read-replica --db-instance-identifier blogger-mm-rr
```

Remove the old master instance. Here is a command for this:

aws rds delete-db-instance --db-instance-identifier blogger-mm --skip-final-snapshot

Rename the former read replica after the old master is gone by using a command like this:

aws rds modify-db-instance \
    --db-instance-identifier blogger-mm-rr --multi-az \
    --new-db-instance-identifier blogger-mm --apply-immediately

Connect to the new master database and resume normal operations.

Flexibility with other scenarios

In addition to setting the replication delay on an existing RDS for MySQL read replica, you might want to set the delay value on the source database instance before you even create a new read replica. Do you have a non-RDS source database? You can also configure a delayed replica in RDS with a non-RDS source. Likewise, you can configure an RDS source for replication with a delayed non-RDS replica.

Conclusion

The ability to use a delayed read replica with Amazon RDS for MySQL opens up new options for disaster recovery. As you can see, a replica delay is very helpful if you experience a situation such as someone accidentally dropping a needed table from your database. You can stop the replication, recover the database up to the point of the accident, promote the replica to a standalone instance, and then resume your application.

About the Author

Wendy Neu has worked as a Data Architect with Amazon since January 2015. Prior to joining Amazon, she worked as a consultant in Cincinnati, OH, helping customers integrate and manage their data from different unrelated data sources.