Migrating 50,000 MySQL Databases to Amazon Aurora with Futuralis

By Vaibhav Bhardwaj, Principal Solution Architect – Futuralis
By Hamdy Eed, Sr. Partner Solution Architect – AWS

Futuralis

Large-scale database migrations are difficult, have multiple phases, and are time-consuming. Customers want a simpler way to achieve large-scale migration with little to no downtime.

In this post, we’ll show you how Futuralis helped their customer to migrate more than 50,000 MySQL databases hosted on an Amazon Elastic Compute Cloud (Amazon EC2) server to Amazon Aurora relational database using Amazon Web Services (AWS) and native MySQL tools.

Futuralis is an AWS Advanced Tier Services Partner and global technology consulting firm raising standards in AWS Cloud services and modern application development.

In the process of understanding what’s important to their customers, Futuralis implements a cloud strategy to reduce costs, improve security, and increase business efficiency.

Use Case Introduction

The customer had MySQL version 5.5 hosted on Amazon EC2, and this MySQL server hosted more than 50,000 databases with 1.5 million tables and hundreds of millions of records. The tables used both InnoDB and MyISAM storage engines.

The customer had organically grown its client base in the last three years, adding many MySQL databases on the same EC2 server.

This resulted in many pain points:

Slow startup time for MySQL—rebooting EC2 server for patch management, updates, and backups caused hours of downtime as MySQL takes long to start.
As the system became more transactional, MySQL struggled with higher values of maximum number of open files allowed or maximum number of connections.
With a growing number of databases, MySQL required bigger heap size and thus required occasional vertical scaling.
Managing consistent backups.
Performance issues—for example, queries were experiencing a high level of latency.
Slow business analytics, such as the same EC2 instance used for transactions and analytics. Running big analytical queries on the same instance caused the EC2 server to slow down.
Too many actions required of the operations team like replication, backups, and restore.
MySQL 5.5 reached end of life.

To address these challenges, the customer decided to migrate all of the databases hosted on Amazon EC2 server to Amazon Aurora in order to benefit from high availability, cost effectiveness, better performance, and decreased manual management.

As this was the primary database server for the end customer and many other applications, the customer needed to have a minimal downtime and set a criterion for one-hour maximum downtime.

Schema Conversion and Validation with SCT

Amazon Aurora doesn’t offer support for MySQL v5.5, so AWS Schema Conversion Tool (SCT) was used to make an assessment if MySQL can be upgraded to higher levels like 5.6 and 5.7.

SCT provided a report that showed Futuralis would need to change a MyISAM table to upgrade MySQL 5.5 to MySQL 5.6 on Aurora.

After validating that schema for MySQL v5.5 is compatible with MySQL v5.6, Futuralis headed to AWS Database Migration Service (AWS DMS) to test how long it takes to do this migration.

Initial Proof of Concept

The customer couldn’t afford a downtime, and no experimentation was allowed on the production server. Futuralis created an Amazon Machine Image (AMI) of the production server and launched a new EC2 server from it in an isolated account.

The newly-created EC2 instance had 4vCPU and 16GB memory allocated to it. Futuralis tried various sizes of DMS instances and settled on the below configuration:

Full large object (64KB chunk size)
Validations enabled
Full load (no change data capture)
AWS DMS instance size: dms.r5.4xlarge
Control table in target using schema
Target: Amazon Aurora for MySQL v5.6

The job took around five days to migrate all of the tables and more than a week to validate the records. Interestingly, out of 1.5 million tables only a few hundred tables triggered an error. Upon closer examination, Futuralis found those tables had binary large object (BLOB) columns with large data.

The team took a list of databases that had BLOB columns in their tables and created a new AWS DMS job, excluding databases that had tables with BLOB columns in it. This time, the migration completed without errors and took under four days for migration and seven days for validation.

The migrated instance (Aurora cluster) was given to the customer to validate if it worked with the non-production application. This gave the customer confidence the migration was feasible, and Amazon Aurora for MySQL was compatible for their applications.

Now comes the most critical part: downtime. This approach required around 11 days for migration and unpredictable time for change data capture, as it depended on how many transactions hit the source database during migration.

Futuralis worked with the customer to segment the 50,000+ databases into active and dormant databases. The idea was to make a different strategy to move active and dormant databases. After careful examination, the team found that out of 50,000 databases, only 4,000 were active—meaning they were getting updated in a two-hour window—while the rest were all dormant.

Next, Futuralis was supposed to migrate 4,000 odd databases to Amazon Aurora with minimal downtime. They again tried AWS DMS but found that it took around six hours to migrate these active databases, which was not acceptable to the customer. This forced the team to move to other approach: MySQL-native mysqldump plus MySQL import.

MySQL Native Tools

Both mysqldump and MySQL import run on a single thread. That means to scale the process of taking mysqldump and import it, Futuralis would need parallel working threads.

They came up with a quick script that could performs below steps:

Read a list of active databases supplied as input.
Ask the user for number of threads for parallel mysqldump to run.
Export active databases in parallel threads using mysqldump command (one SQL file per database).
Upload SQL output files created using mysqldump on Amazon Simple Storage Service (Amazon S3).
Trigger an AWS Lambda function on PUT of new objects on S3. This Lambda stores the metadata of the SQL output file in Amazon DynamoDB that maintains the state of a database (imported to Aurora or not)

Futuralis had one more EC2 server running (consumer server) that had a script to check:

Check Aurora for active connections.
If active connections were less than 200, get list of inactive databases from DynamoDB.
Start importing databases in Aurora using MySQL import by downloading the SQL file from Amazon S3.

With four threads for exporting the databases and 16 threads for MySQL import, Futuralis was able to finish all of the migration of the active 4,000 databases in 45 minutes.

The following architecture diagram shows the end-to-end migration flow.

Figure 1 –Migration flow deployment architecture.

To monitor the progress of migration and make sure no element of the architecture throttles, the following metrics were monitored in Amazon CloudWatch:

Amazon EC2
- CPU usage
- Memory usage
- Disk IOPS
Amazon Aurora
- CPU
- Memory
- Number of DB connections
- DDL/DML per second
- Inserts and updates per second
- Slow running queries

Automation

The solution was automated using a combination of AWS CloudFormation templates and shell scripts. CloudFormation was used to create AWS resources such as Lambda functions, EC2 instances, and DynamoDB tables.

Shell scripts were used to handle parallel processing for mysqldump and store database files on Amazon S3, store metadata in DynamoDB, retrieve database dumps from S3, and restore to Aurora database in parallel.

Lessons Learned

AWS DMS is an excellent tool to validate if the source database can be migrated to Amazon Aurora.
AWS DMS comes in handy when migrating databases from production server, and also when validating the records once migrated.
The tables/records that fail to migrate are also shown by AWS DMS.
If the source database has many big BLOB columns, then AWS DMS struggles to migrate these records and MySQL-native tools like mysqldump and mysqlimport.
mysqldump and mysqlimport commands are single threaded and export/import only one database at a time.
AWS services like Lambda with S3 triggers can help achieve an asynchronous multi-threaded processing environment quickly.
Amazon Aurora provided better performance than managing a MySQL server on an EC2 server. For example, during database peak usage a self-managed MySQL on Amazon EC2 needed 3,000 disk IOPS to provide the required performance. Meanwhile, using same workload on Aurora only required less than 1,000 IOPS.
Aurora offers read replicas that can be used for read-only queries and business analytics applications. This helps reduce load on the primary database instance.

Conclusion

AWS Database Migration Service (AWS DMS) and AWS Schema Conversion Tools (SCT) are excellent services to migrate production workloads to Amazon Aurora relational database without downtime. They also offer change data capture functionality for ongoing replication.

Under some circumstances, however, AWS DMS alone may not be able to migrate databases at scale. In that case, MySQL-native services can be scaled better by using other AWS services like Amazon EC2, Amazon S3, AWS Lambda, and Amazon DynamoDB.

That doesn’t mean one service is better than the other; rather, it simply means a customer should explore all of the options available before making the choice. Mostly, as Futuralis learned in its customer use case, all of these AWS services complement each other very well.

.

.

Futuralis – AWS Partner Spotlight

Futuralis is an AWS Partner and global technology consulting firm raising standards in AWS Cloud services and modern application development.

Contact Futuralis | Partner Overview

*Already worked with Futuralis? Rate the Partner

*To review an AWS Partner, you must be a customer that has worked with them directly on a project.