Deploy schema changes in an Amazon Aurora MySQL database with minimal downtime
Modifying the schema of a SQL database can be time-consuming, resource-intensive, and error-prone. It can also require long periods of downtime that negatively affects the end-user experience. In this post, I walk you through performing schema changes using Instant DDL and Amazon Relational Database Service (Amazon RDS) Blue/Green Deployments for Amazon Aurora MySQL-Compatible Edition with minimal downtime on production workloads.
Aurora MySQL version 3 is compatible with the MySQL 8.0 feature called Instant DDL. This feature significantly speeds up supported schema changes like adding a column, setting or dropping column default values, renaming a table, and more. Instant operations only modify metadata in the data dictionary. An exclusive metadata lock on the table may be taken briefly during the execution phase of the operation but table data is unaffected, making operations instantaneous.
For schema changes that cannot be done using the INSTANT algorithm, you can use Amazon RDS Blue/Green Deployments that lets you perform schema changes on staging environment that are isolated from the production system. With this feature, you can create a staging environment (Green) that is a topological copy of your current production environment (Blue). The Blue and Green environments stay in sync using binlog replication.
The following diagram illustrates RDS Blue/Green deployment architecture.
In the following sections, we demonstrate how to apply schema changes for an Aurora MySQL cluster using instant DDL and RDS Blue/Green Deployments.
To implement this solution, you need a database in an Aurora cluster for which you want to perform schema changes. You can also create a new Aurora MySQL cluster and then load the employees dataset to follow along with this post.
Instant DDL with Aurora MySQL version 3
You can use the instant DDL feature by using the ALGORITHM=INSTANT clause of the ALTER TABLE statement. In the example below, we are going to connect to the Aurora MySQL cluster and ADD COLUMN to the employees table and verify the behavior. Make sure to test on your development and non-production environments before you perform ALTER TABLE operations in your production environment.
- Connect to the Aurora MySQL cluster as an existing user. In this post we use an admin account to connect to the cluster
aurora-dbusing MySQL utility installed on AWS Cloud9 instance.
For example :
- You can run the following command by replacing the placeholders and adding a column to the end of the table instantly.
- In the following example, we add the column email to the end of employees table which has approximately 300k rows.
Instant DDL is only available for a limited subset schema change operations. See the full breakdown in the MySQL documentation.
For changes such as the following, you may consider RDS Blue/Green Deployments.
- Adding and modifying indexes
- Partitioning a table
- Optimizing and rebuilding a table to reclaim storage space
- Other operations that require a substantial reorganization of the table
Let’s look at some code examples of various schema changes that can be done using RDS Blue/Green Deployments.
Create a RDS Blue/Green Deployment for your Aurora MySQL cluster
The Aurora MySQL cluster must have binary logging enabled for replication from the blue environment to the green environment. While any binlog format works, we recommend ROW to reduce the risk of replication inconsistencies. You can enable binary logs by setting the
binlog_format to ROW through a custom DB cluster parameter group. Note that this parameter is a static parameter and the nodes in the cluster will require a manual reboot to reflect the parameter change.
We can use an AWS CLI command to verify the status of the parameter group with the nodes in the cluster. In the following example, the
in-sync value for
DBClusterParameterGroupStatus indicates that the change has been applied:
binlog_format is modified, you can now create the Blue/Green Deployment for Aurora MySQL cluster through the AWS CLI, Amazon RDS API, or console. For more information, refer to Creating the Blue/Green Deployments.
For this post, we use an AWS Cloud9 instance to access the AWS CLI and Aurora MySQL cluster. See the following example:
To check the source and target clusters of the Blue/Green Deployment, run the following command:
You can retrieve the Blue/Green Deployment identifier by running the following AWS CLI command:
Connect to the Green Aurora MySQL cluster
Connect to the clone Aurora MySQL database in the Green environment as an existing user. The native user credentials remain the same as Blue. However, IAM DB authentication credentials are not mirrored to Green cluster. If you are using RDS IAM authentication, set up IAM credentials for the Green Aurora MySQL cluster separately.
In this example, we use the admin account that has create and alter table privileges for employees database to connect to the cluster
aurora-db-green-ikfake. We strongly recommend that you do not use the master user directly. Instead, adhere to the best practice of using a database user created with the minimal privileges required for this operation.
See the following example:
Let’s check the tables that are available in our
Apply schema changes in the Green environment
Rebuilding a table
Rebuilding a table helps optimize the underlying storage and reclaim the unused space lost due to fragmentation. This can not only reduce storage costs, but may also improve query performance.
In the following example, we are going to rebuild a table to reclaim the space lost due to fragmentation. We use the Blue cluster and the
salaries table in the
employees database that we loaded as a prerequisite.
There are approximately 2.8 million rows in the salaries table and the space occupied by this table is approximately 95 MB.
Let’s check the fragmented space for the table after updating statistics by running the ANALYZE TABLE command on
We see the DATAFREE is 4 MB by running the following SQL command.
As the rebuild operation locks the table making it inaccessible to the user, we can run the rebuild operation on the
salaries table in the Green environment, isolating it from the Blue environment.
After logging into the Green environment, run the following ALTER TABLE command to start the rebuild to reflect the actual size of the table. The duration of the rebuild will depend on the size of the table. The larger the table size, the longer the ALTER TABLE runs.
Let’s check the size of the table by running the following code. We see the table size increased to 109 MB.
Now, delete 500,000 records from the salaries table in the Blue environment to demonstrate how deletion of records can cause fragmented space.
We follow the same steps as before by logging into the Green environment. Verify the fragmented space.
After deletion of 500,000 records in Blue cluster, we see the DATAFREE increased to 12 MB from 2MB and table size decreased from 109 MB to 99 MB in Green cluster.
To reclaim the free space shown in DATAFREE, we need to rebuild the table again, using the following ALTER TABLE command.
After running the following SQL commands, we see that the space has been reclaimed.
Salaries table size is now 89 MB and DATAFREE is 3 MB.
Reorganize a partition
Most tables require database maintenance, such as reorganizing the partitions, which may lock the partition until the ALTER operation is complete. Therefore, we’re going to reorganize a partition for table titles on the Green environment and then verify replication by inserting a few records on the Blue environment.
If you’re following along with this post using the
employees database, to prepare the Blue environment, we drop the
titles table and recreate it with partitions on the Blue environment and load the data into that table again. See the following code:
Now let’s reorganize the first partition to expand on the range of years on the Green environment by running the following SQL command:
For this example, we update a record in the
p01 partition in the
titles table on the Blue environment to see if it’s replicating to the Green environment without errors even though it’s been reorganized:
Check for replication status using
show replica status\G; and run the following SQL command to verify if the data is replicated to Green environment:
Switch over a Blue/Green deployment
Once we complete the schema changes on the Green environment, we are now ready to switch over. We recommend you switch your databases in the green environment to read only by setting read_only database parameter to 1 in the DB cluster parameter group to avoid replication conflicts or unintended data in Green environment.
A switchover promotes the DB cluster in the Green environment to be the new production DB cluster and as part of this process, RDS renames the DB cluster and DB instances in the Green environment to match the corresponding DB cluster and DB instances in the Blue environment.
You can specify a switchover timeout , your maximum tolerable downtime period between 30 seconds and 3,600 seconds (1 hour). If RDS cannot complete the switchover within the specified timeout, then it is cancelled and production traffic continues to flow to the Blue Aurora MySQL cluster. The default timeout period is 300 seconds (5 minutes).
When you start the switchover process, Amazon RDS runs some basic checks to test the readiness of the Blue and Green environment known as switchover guardrails. These are checks that prevent switchover operation if the environments aren’t ready for it.
We strongly recommend that you adhere to best practices before planning to switch over. When you’re ready, run the following AWS CLI command, and specify the time limit for the switchover in seconds (the default is 300):
To verify if your switchover is progressing, you can check using the following code:
When the switchover is complete, we can observe that the DB cluster and instances in the Green environment are renamed to match the DB cluster and instances in the Blue environment so that application changes aren’t required. It is your responsibility to have applications detect this DNS name change after switchover to Green environment and route production traffic.
See the following code:
After switching over, Blue/Green Deployments do not delete your old production environment. RDS renames the DB cluster and DB instances in the Blue environment by appending
-oldn to the current name, where n is a number. You may access it for additional validations and performance/regression testing, if needed.
It is your responsibility to delete the old production environment when you no longer need it. Standard billing charges apply on old production instances until you delete them.
In addition to the limitations for Blue/Green Deployments, we need to account for binlog replication limitations when considering schema changes:
- If you’re performing maintenance on partitioned tables on the Green environment, binlog replication continues to operate as long as you are using the same partitioning schema, otherwise they have the same structure.
- Schema changes such as data type changes and adding columns to the Green environment are supported but subject to conditions that are limitations from the binlog replication. For example, you can replicate from a CHAR(10) column to another CHAR(10), or from a CHAR(10) column to a CHAR(25) column without problems.
- Renaming of tables or columns in the Green environment will break the binlog replication between the clusters.
Now let’s look at a use case where a schema change is not possible using the RDS Blue/Green deployment feature where we encounter a limitation with binlog replication.
Add columns to the middle of the table
In this scenario, I created another Blue/Green deployment for a new Aurora cluster with
employees dataset. I am running the following ALTER TABLE statement on the
employees table on the Green Aurora MySQL cluster which adds a new column
phone_number that comes between columns common to both versions of the
employees table. See the following code:
Now I insert a record into the
employees table on the Blue environment
To review replication status on the Green environment, I run
show replica status\G SQL command on the Green Aurora MySQL cluster that shows the following error:
We are unable to switchover due to the replication error we encountered. Hence, it’s essential for you to plan schema changes with the tools that best fit the use case.
To clean up after switchover, delete the database instances in the Blue environment. If you created a new Aurora MySQL cluster to follow along with this post, delete the Blue/Green Deployment object and Green environment as well. For instructions, see Deleting Aurora DB clusters and DB instances.
In this post, we covered how you can implement schema changes using INSTANT DDL and Amazon RDS Blue/Green Deployments in Amazon Aurora MySQL-Compatible Edition with minimal impact. You can apply the same approach to Amazon RDS for MySQL and RDS for MariaDB instances. Get started with RDS Blue/Green Deployments via the Amazon RDS Console.
We welcome your feedback; leave your comments or questions in the comments section.
About the author
Adarsha Kuthuru is a Database Specialist Solutions Architect at Amazon Web Services. She works with customers to design scalable, highly available and secure solutions in the AWS Cloud. Outside of work, you can find her painting, reading or hiking in Pacific North West.