AWS Database Blog

Implement a rollback strategy after an Amazon Aurora MySQL blue/green deployment switchover

Amazon Aurora, a high-performance, fully managed relational database service offered by Amazon Web Services (AWS), provides users with a Blue/Green Deployments feature that enables you to make database updates safer, simpler, and faster. Blue/Green Deployments create a fully managed staging environment using logical replication, that allows you to deploy and test production changes, keeping your current production database safer. The blue environment represents the database currently managing the production workload, while the green environment operates with necessary updates or changes. Blue/Green Deployments minimizes risks and reduce downtime associated with the database updates or changes, such as major or minor engine version upgrades and system updates. You can efficiently switch over your staging environment to become the new production environment without any changes to your application end point.

Despite careful planning and rigorous testing in the green environment, unforeseen issues might occasionally arise after the blue/green deployment switchover. For instance, application compatibility problems with the new production environment could arise, which might have gone unnoticed during the testing phase. Additionally, application performance degradation could occur due to the increased workload or resource demands in the new production environment, which might not have been accurately simulated during testing. In such cases, having a rollback plan becomes crucial.

In this post, we discuss the steps to perform a blue/green deployment switchover and how to set up and perform a rollback strategy post switchover for Amazon Aurora MySQL-Compatible Edition.

Solution overview

Following a successful Aurora MySQL blue/green deployment creation and switchover, the green environment becomes the new production environment. The names and endpoints from the current production environment are assigned to the newly promoted production environment, with no changes needed to your application. After the switchover, the old production environment is no longer synchronized with the newly promoted production environment. The DB cluster and DB instances in the old production environment are renamed by appending -oldn to the current name, where n is a number and this old environment can be used as a rollback strategy. In this approach, we manually setup logical replication from the new production environment to the old environment following the switchover to achieve a rollback plan in case of an unexpected issue.

The following diagram illustrates the rollback strategy after the Aurora MySQL blue/green deployment switchover.

To implement this solution, the high-level steps are:

  1. Prepare the green environment (staging) for rollback.
  2. Perform the Aurora MySQL blue/green deployment switchover.
  3. Delete the Aurora MySQL blue/green deployment. At this stage, deleting a blue/green deployment doesn’t affect any of your database environment.
  4. Configure logical replication from the production environment to the rollback environment.
  5. In case of any issues in the production environment, switch to the rollback environment.

In this post, we use Aurora MySQL-Compatible Edition version 2 (MySQL 5.7) as the blue environment and Aurora MySQL-Compatible Edition version 3 (MySQL 8.0) as the green environment.

Prerequisites

You need the following components to implement this solution:

Limitations

These are limitations associated with this solution:

  • MySQL doesn’t officially support replication from source database instance running higher major version to target database instance running lower major version; therefore, replication might encounter problems or difficulties (for more details, see Replication Compatibility Between MySQL Versions). Because you want to have a rollback strategy for a short period of time while you settle into the new environment, during this time, make sure you don’t use new database features that are only available in the new database version. This can cause MySQL replication errors and prevent the possibility of rolling back to the old version.
  • Managing schema changes in a MySQL replication environment necessitates careful planning. Managing schema changes in a MySQL replication environment necessitates careful planning. Some schema changes, such as ALTER TABLE, CREATE TABLE, or DROP TABLE statements, may not function with this rollback solution.

Prepare the green environment for rollback

Before the blue/green deployment switchover, configure the green environment for rollback:

Configuring Aurora MySQL binary logging

Before the Aurora MySQL blue/green deployment switchover, ensure that binary logging is active and capturing binary logs in the green environment. These binary logs are crucial for continuous replication. By default, binary logs are disabled on Aurora MySQL and don’t need to be enabled unless data is being replicated out of the Aurora cluster. To enable binary logs, set the binlog_format parameter to ROW or MIXED in the custom DB cluster parameter group attached to the source DB cluster. Since binlog_format is a static parameter, the writer DB instance of your cluster must be rebooted for the change to take effect. Therefore, it’s recommended to enable the binlog_format parameter in the green environment before the blue/green deployment switchover to avoid any reboot or outage after the switchover. For more information about MySQL binary logging, see Configuring Aurora MySQL binary logging.

To confirm that binary logging is active on the green Aurora DB cluster, connect to your instance and run the following command:

mysql> show variables like 'log_bin';
+----------------+------------+
| Variable_name  | Value      |
+----------------+------------+
| log_bin        | ON         |
+----------------+------------+

In this post, we set the binlog_format parameter to ROW:

mysql> show variables like 'binlog_format';
+----------------+------------+
| Variable_name  | Value      |
+----------------+------------+
| binlog_format  | ROW        |
+----------------+------------+

Configuring Aurora MySQL binary log retention

When using logical replication as a rollback approach, it’s essential to ensure that binary logs on the green environment are retained for a sufficient duration. To set the binary log retention time, use the mysql.rds_set_configuration procedure and specify the configuration parameter binlog retention hours, along with the desired number of hours to retain binary logs on the DB cluster. The maximum value for Aurora MySQL 3.x and Aurora MySQL version 2.11 and later is 90 days, while the default value of binlog retention hours is NULL, signifying that binary logs are not retained. The binary log retention period should be sufficient to handle the maximum replication delay between the production and rollback environment. This is the time to complete the blue/green switchover and configure replication in the rollback environment. For more information about Aurora MySQL binary log retention, see configuring binary log retention.

In this post, we set the binlog retention hours to 24 on the green environment, which should provide ample time to perform a blue/green switchover and configure replication back to the rollback environment.

 mysql> CALL mysql.rds_set_configuration('binlog retention hours', 24);

        Query OK, 0 rows affected (0.01 sec)

Validate the binary log retention after the change:

mysql> CALL mysql.rds_show_configuration\G
*************************** 1. row ***************************
       name: binlog retention hours
      value: 24
description: binlog retention hours specifies the duration in hours before binary logs are automatically delete
1 row in set (0.00 sec)

Perform the Aurora MySQL blue/green deployment switchover

A switchover involves promoting the DB cluster and its associated DB instances from the green environment to become the new production DB cluster. Prior to the switchover, production traffic is directed to the cluster in the blue (current production) environment. After the successful switchover, production traffic is then routed to the newly promoted DB cluster, which was previously the green (staging) environment.

  1. Perform the Aurora MySQL blue/green deployment switchover. For information about the steps, see Switching a blue/green deployment and for the best practices for switchover, see Switchover best practices.

    When the switchover is complete, you will see “Old Blue” and “New Blue” next to the database in the AWS Management Console for Aurora. The new blue environment is your newly promoted production environment.

    After the switchover, the old blue DB cluster only allows read operations until you reboot.

    After the switchover, the Aurora MySQL DB cluster and DB instances in the old production environment are retained. Standard costs apply to these resources.

  2. After the switchover, capture the binary log file name and position from the new blue environment. For more details, see get binary log coordinates .The following AWS Command Line Interface (AWS CLI) command helps find the current binary log file name and position:
    aws rds describe-events --output json --source-type db-instance --source-identifier database-1-instance-1
    
    {
        "Events": [
    ...
            {
                "SourceIdentifier": "database-1-instance-1",
                "SourceType": "db-instance",
                "Message": "Binary log coordinates in green environment after switchover:
                 file mysql-bin-changelog.000007 and position 2638",
                "EventCategories": [],
                "Date": "2024-03-10T01:33:41.911Z",
    ...
           }
        ]
    }
  3. Delete the blue/green deployment.
    Note: When you delete a blue/green deployment before switching it over, Amazon RDS optionally deletes the DB cluster in the green environment. Deleting a blue/green deployment doesn’t affect the blue environment. For more information, see deleting a blue/green deployment.

    After you delete the blue/green deployment, new and old production DB clusters are in available status, as shown in the following image.

  4. Set the old production DB cluster to read-only mode to avoid any write operations that could cause problems with replication from the new production environment to the old environment. To prevent write operations on the DB cluster, enable read_only database mode by changing the read_only value in DB cluster parameter group from 0 to 1.
    Because it is a dynamic parameter, the modifications take effect without the need for a reboot.

  5. Connect to the old production DB cluster and validate the DB instance mode after the change is applied:
    mysql> show global variables like 'read_only';
    +---------------+-------+
    | Variable_name | Value |
    +---------------+-------+
    | read_only     | ON    |
    +---------------+-------+

Note: After deletion of the Aurora blue/green deployment, the terms “blue” or “green” are no longer relevant. For this post, we refer to the “new production” or “new blue” environment as the production environment, and the “old production” or “old blue” environment as the rollback environment.

Configure a replication user on the production DB cluster

  1. In the production database, create a replication database user. In this post, we create a user named repl_user:
    mysql> CREATE USER 'repl_user'@'<IP_address>' IDENTIFIED BY '<password>';
  2. The user requires the REPLICATION CLIENT and REPLICATION SLAVE Grant these privileges to the user:
    mysql> GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'repl_user'@'<IP_address>';

Configure binary log replication from production to rollback DB cluster

  1. Connect to the rollback environment DB cluster endpoint and run a SQL command to configure manual MySQL replication by using rds_set_external_master. Use the MySQL binlog file name and position that you collected earlier:
    mysql> call mysql.rds_set_external_master ('database-1-instance-1-old1.xxxxxxx.xxxx.rds.amazonaws.com', 3306, 'repl_user', '<Password>', '<Binlog file>', '<Binlog position>', 0);
  2. Start the MySQL replication process from the rollback environment by using rds_start_replication:
    mysql> CALL mysql.rds_start_replication;
    +-------------------------+
    | Message |
    +-------------------------+
    | Slave running normally. |
    +-------------------------+
    1 row in set (1.03 sec)

    Starting with Aurora MySQL version 3, data control language (DCL) statements such as CREATE USER, GRANT, and REVOKE are no longer replicated with the binary log replication. If you plan to run any DCL statements while replication is ongoing, you will need to run DCL statements on both the source and target databases.

  3. Validate the MySQL replication status by using show slave status on the rollback environment, and make sure the replication process is up and running without any errors:
    mysql> pager egrep "Slave_IO_Running|Slave_SQL_Running|Error"
    PAGER set to 'egrep "Slave_IO_Running|Slave_SQL_Running|Error"'
    
    mysql> show slave status\G
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
                       Last_Error: 
                    Last_IO_Error: 
                   Last_SQL_Error: 
          Slave_SQL_Running_State: 
          Last_IO_Error_Timestamp: 
         Last_SQL_Error_Timestamp: 
    
    mysql> nopager
    PAGER set to stdout

Rollback steps

In the event of a rollback scenario from the production to the rollback environment, follow these steps:

  1. Monitor the replication lag to make sure the rollback environment can keep up with the production environment without significant lag. Use the following command on the rollback DB cluster to check the status.
    mysql> SHOW SLAVE STATUS\G
    *************************** 1. row ***************************
    
                  Master_Log_File: mysql-bin-changelog.000010
              Read_Master_Log_Pos: 2836
                   Relay_Log_File: relay-bin.000123
                    Relay_Log_Pos: 1569
            Relay_Master_Log_File: mysql-bin-changelog.000010
    ...
                  Master_SSL_Cert:
                Master_SSL_Cipher:
                   Master_SSL_Key:
            Seconds_Behind_Master: 5
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error:
    
    ...
  2. After you find that there is no significant replication lag, stop all application connections on the production DB.
  3. To prevent write operations on the production DB cluster, you can enable read_only database mode by changing the read_only value in the DB cluster parameter group from 0 to 1.
  4. After enabling the read_only database mode, verify that the binary log position remains unchanged by running SHOW MASTER STATUS\G on the production environment. This confirms that no additional data changes are occurring in the database.
  5. Validate that all binary log files and events have been replicated from the production to the rollback DB cluster. Use the following command on the rollback DB cluster to check the status:
    mysql> SHOW SLAVE STATUS\G
    *************************** 1. row ***************************
    ...
    
            Relay_Master_Log_File: mysql-bin-changelog.000010
    ...
    
              Exec_Master_Log_Pos: 2836
    ...
    
            Seconds_Behind_Master: 0
    ...

    Validate Seconds_Behind_Master from the above output; it should be 0. For more information, see seconds_Behind_Master. Furthermore, use show master status command on the production DB cluster to ensure that the production and rollback DB clusters are in sync.

    mysql> SHOW MASTER STATUS\G
    *************************** 1. row ***************************
                File: mysql-bin-changelog.000010
            Position: 2836
    ...
    
    1 row in set (0.00 sec)

    Compare the Relay_Master_Log_File and Exec_Master_Log_Pos from the rollback environment with the File and Position from the production environment. These values should be consistent before you stop the replication process.

  6. Use the following command on the rollback DB cluster to stop the replication process:
    mysql> call mysql.rds_stop_replication;
  7. To remove the MySQL replication configuration information, use rds_reset_external_master on the rollback DB cluster:
    mysql> call mysql.rds_reset_external_master;
  8. Turn off the read only database mode for the rollback DB cluster by changing the read_only parameter in the custom database parameter group setting from 1 to 0.
  9. Update your application configuration to use the rollback DB cluster endpoints and start the application. Optionally, terminate all connections on the production cluster to ensure they reconnect to the rollback cluster. The rollback DB cluster environment now becomes your production environment.

Clean up

To avoid incurring future charges, consider deleting Aurora MySQL DB cluster resources that are no longer in use or will not be used in the future.

Summary

In this post, we provided a step-by-step procedure for implementing a rollback strategy following a switchover in an Aurora MySQL blue/green deployment. Given the intricacies of MySQL replications, thorough testing in non-production environments is strongly advised before deploying in a production environment. The solution provides a rollback plan in case any issues arise in a production environment after blue/green deployment switchover.

We invite you to leave your feedback in the comments section.


About the authors

Daxeshkumar Patel is a Database Consultant on the Professional Services team at Amazon Web Services. He works with customers and partners in their journey to the AWS Cloud with a focus on database migration and modernization programs.

Kamal Singh is a Senior Database Consultant at Amazon Web Services. He works with customers and partners in their journey to the AWS Cloud with a focus on database migration and modernization programs.

Bhavesh RathodBhavesh Rathod is a Principal Database Consultant with the Professional Services team at Amazon Web Services. He works as a database migration specialist to help Amazon customers migrate their on-premises database environment to AWS Cloud database solutions.