AWS Database Blog
Achieve faster switchover for Amazon RDS Blue/Green Deployments with large number of connections
In this post, we show you a recent improvement for Amazon RDS Blue/Green Deployment switchovers to reduce your overall downtime when you have a large number of connections to your database.
Blue/Green Deployments enforce safety measures to make sure that the switchover from your blue environment to the green environment maintains data consistency.
For reference, to provide a safe switchover the following steps are taken:
- Run guardrail checks to verify if the blue and green environments are ready for switchover.
- Stop new write operations on the primary DB instance in both environments.
- Drop connections to the DB instances in both environments and don’t allow new connections.
- Wait for replication to catch up in the green environment so that the green environment is in sync with the blue environment.
- Rename the DB instances in the both environments.
- Allow connections to databases in both environments.
- Allow write operations on the primary DB instance in the new production environment.
One of these steps is to cleanup connections (3) from the blue environment so that user applications are triggered to re-establish their connections to the new production cluster. This is run after blocking writes (2) on both blue and green, which makes this step in the path of write downtime, as noted in switchover actions.
In Amazon Aurora MySQL-Compatible Edition and Amazon Relational Database (Amazon RDS) for MySQL, previously, this process could take up to 60 seconds for every 1,000 connections on a DB instance. With the recent updates, this process has been reduced to less than a few seconds even for 15,000 connections or more by using MySQL offline_mode
.
Offline mode
OFFLINE_MODE
is an engine feature that allows quicker cleanup of existing connections. As noted in the MySQL documentation:
“In offline mode, the MySQL instance disconnects client users unless they have relevant privileges, and doesn’t allow them to initiate new connections. Clients that are refused access receive an ER_SERVER_OFFLINE_MODE error.”
Switchover tests
To demonstrate this improvement, we create an Aurora MySQL cluster using 8.0.mysql_aurora.3.04.1
version and db.r6i.4xlarge
family with a single DB instance. We attach a custom parameter group and set a max_connections
parameter value of 16000
. We then test the switchover and measure the total write downtime.
First, to effectively measure write downtime, we prepared a simple heartbeat table to record write timestamps every second during switchover.
We also prepare a simple sysbench-based dataset to generate a small workload:
Once the dataset is ready, we create a total of 15,360 connections by spawning 30 background sysbench
processes with 512 connections each:
We monitor when the connections build up on the cluster using a simple for loop from another session:
After our connections have built up, we stop the monitoring process and start a continuous write to the heartbeat table:
Now that our workload is running and heartbeat monitoring is in place, the next step is to trigger the switchover using the AWS CLI:
When the switchover is complete, we can stop the heartbeat writing process and inspect the output from our monitoring:
The output tells us the following:
- At
Tue Nov 21 04:49:22 UTC 2023
, blue became read-only - At
Tue Nov 21 04:49:45 UTC 2023
, the cluster DNS endpoint is now pointing to the green environment - At
Tue Nov 21 04:50:03 UTC 2023
, green, which is now the production cluster, becomes writable
In our heartbeat table, the test shows our total write downtime of around 42 seconds:
To get an idea of how long it really was to clean up connections on blue during switchover, we can look at the DB instance events. In our test, the event information shows it took less than a minute to clean up all over 15,000+ connections:
Conclusion
This post provided a quick demonstration of how OFFLINE_MODE helps reduce your overall write downtime for major version upgrades. This feature is available for Blue/Green Deployments using Amazon Aurora MySQL 2.x and above and Amazon RDS for MySQL 5.7 and above. For more information, refer to Best practices for Blue/Green Deployments and Switchover best practices.
About the Author
Jervin Real is a Senior Database Engineer at Amazon Web Services helping Amazon RDS for MySQL and MariaDB customers towards efficiency.