AWS Database Blog

Intuit story: Automate migration from on-premises MySQL to Amazon Aurora

When it comes to the public cloud, Intuit is all in. In fact, we’re adopting AWS at a rapid pace. We sold our legacy data center and we’re moving our core customer-facing applications — QuickBooks, TurboTax, and Mint — into AWS. We plan to be fully migrated within the next few years.

Databases are core to many of our applications at Intuit. The database team has been working out which architecture to standardize on and what run books and tools to build in order to migrate and then operate in the cloud. We realized that the fastest way to resolve our questions would be to take one of our existing on-premises applications and run it through an actual migration to Amazon Aurora.

In this blog post we want to share with you the following:

  • Key rationale we used to decide an architecture and why we chose Aurora
  • How we approached the migration with minimum downtime, the lessons learned, and how we addressed issues
  • The tools we developed along the way to automate some of the processes, especially for the SSL environment, so that we and others could rinse and repeat

The application we chose to migrate is the Test Execution Platform (TEP), which runs on MySQL. It’s a centralized testing platform that has hooks into many of our offerings and was a perfect candidate to migrate to Amazon Aurora. Other applications that are tied to it are being migrated to Aurora as well.    

Which database architecture to use?

Before the migration to AWS, we ran our data platform in Intuit Hosted Platform (IHP) data centers, as shown in the following diagram. We have high availability within a data center and also between two geographical regions. We leverage MySQL binary log (binlog) based replication to achieve high availability.

 Intuit data platform before AWS migration

As we evaluated our options for AWS, we debated between these two approaches:

  • Lift and shift – meaning we would run MySQL servers on Amazon EC2 instances ourselves (self-managed)
  • Shift to a cloud-native platform that is fully managed

We want a cloud solution to do better than what we currently have.We want to keep database maintenance effort low, reduce overall infrastructure cost, create more granular and reliable backup and recovery processes, and decrease downtime. We decided to go with a fully managed database for the reasons mentioned. We also want to free up our database team to work side-by-side with our application teams on the next wave of innovations.

Amazon RDS for MySQL or Amazon Aurora MySQL compatible?

Next, we had to decide whether to go with Amazon RDS for MySQL or Amazon Aurora MySQL compatible. Both are fully managed services by AWS, with pros and cons for each solution. Aurora has many advanced capabilities over RDS MySQL, including shorter replication lags, automatic storage growth, higher throughput, and a Serverless deployment model. However, RDS MySQL supports GTID replication while Aurora does not support it yet. We decided to go with Aurora MySQL and were able to work around the lack of GTID support by using the AWS Database Migration Service (DMS), where that was needed.

Intuit data platform as we are migrating to AWS

Proof of Concept

Our ultimate goal was to migrate the TEP database from our own data centers to Aurora MySQL 5.6 compatible. For our proof of concept, we first replicated our on-premises environment to MySQL servers running on Amazon EC2 to lessen the impact of migration attempts and activities on our production database. We then carried out the migration between a MySQL server running on Amazon EC2 to Aurora. After validating that replication worked as expected between Amazon EC2 and Aurora, we have configured replication from on-premises to Aurora directly.  During our migration, we made sure that SSL was enabled for replication.

Lessons Learned

We encountered a couple of issues during our migration process, and want to share with you how we addressed them.

OpenSSL library version compatibility
When working with a public/private encryption key infrastructure and digital certificates, you should use caution, as any issue while creating/importing certificates can cause loss of connectivity between replication partners, and thus downtime for applications.

We ran into an issue where we could not replicate using SSL from our on-premises database, running MySQL 5.6.24, to Aurora MySQL 5.6 compatible. After a few unsuccessful attempts to eliminate possible causes, we reached out to AWS Support. Providing a packet trace to AWS Support helped us nail down the problem. It was caused by incompatible OpenSSL versions between Aurora and our on-premises database server. We were using OpenSSL 1.0.2j-fips, which has proven to be incompatible with non-fips versions 1.0.2k or newer, which are the versions Aurora supports. OpenSSL 1.0.2k is also part of MySQL 5.6.36. To reduce the risk of other compatibility issues, we decided to upgrade our on-premises database servers to MySQL 5.6.36.

Specifying a CN name while creating a CA certificate
While unrelated to Aurora, when creating a certificate, we made the mistake of selecting the default value for Common Name (CN) instead of specifying the appropriate server and client names. This generated an abstract error, which took us a while to decipher. Make sure you provide a valid name rather than accepting the default CN name.

Binlog retention is set to NULL by default
Unlike other MySQL-based database engines, Aurora does not rely on binlogs for cluster high availability. In the default configuration, binlogs are not enabled, and if you want binlog replication, you have to enable and configure it explicitly. You should set an appropriate binlog retention value (such as 6 days in our case), by using the following stored procedure:

CALL mysql.rds_set_configuration(‘binlog retention hours’, 144)

Security considerations

We wanted to make sure we had end-to-end security. Below is what we have done for each of the security components.

When configuring Amazon Aurora to use SSL to encrypt connections, we recommend using the region-specific Intermediary Certificates published by AWS, rather than the certificate bundle that works for all regions. This provides an additional layer of separation, and can further improve security when working with multiple regions.

Monitoring and management

To monitor Aurora-specific metrics, we leverage Amazon CloudWatch and native Aurora monitoring features, such as Enhanced Monitoring and Performance Insights. At Intuit, we also use many internally developed and third-party tools. Our goal is to plug into these tools to automate Aurora provisioning, monitoring, logging, and policy enforcement.

Migration cookbook and automation

Since we struggled to find ready-made scripts and tools to automate SSL-based migration from on-premises to Aurora, we had to develop them ourselves. The good news is that we have made these scripts available on GitHub. Following is a high-level walkthrough of the migration steps that we have taken. Where applicable, we have also highlighted the scripts we have created to automate the steps. The details of what to do in each step is described in this document on GitHub.

Part I: Setting up the replication slave on AWS

  1. Verify that Network Access Control Lists (NACLs) are open between Aurora and on-premises hosts for port 3306.
  2. Create a new Aurora instance. To automate this step, we created an AWS CloudFormation template.

Part II: Configuring the replication master on-premises, and enabling replication

  1. Generate encryption certificates. The scripts can be found here.
  2. Create a replication user “repl_aurora” with SSL enabled.
  3. Verify permission and ownership on the certificate.
  4. Enable SSL on the on-premises MySQL database via my.cnf.
  5. Restart mysqld process if there is a change to the certificate and/or its location.
  6. Verify that SSL is enabled by using the “SHOW VARIABLES” SQL statement.
  7. Verify that you can connect to the on-premises database as the replication user you created in step 2.
  8. Verify that you can connect from an Amazon EC2 host or Bastion host to the on-premises MySQL database.
  9. Verify that you have connectivity to the Aurora DB cluster from on-premises.
  10. Get the binlog position from the on-premises MySQL database by using “SHOW MASTER STATUS\G”
  11. Back up the on-premises database. The Aurora replication documentation provides additional details on how to ensure consistency between the backup and binlog position.
  12. Restore the database backup on the Aurora DB cluster.
  13. Verify that restored data on the Aurora DB cluster is accurate by using the mysqlshow or mysqldbcompare
  14. Import the certificates that you generated for the on-premises MySQL database to the Aurora DB cluster.
  15. Create replication channel between the on-premises database server as master and Aurora as a slave.
  16. Verify that the replication process is up and running by monitoring the status on the Aurora DB cluster by using the “SHOW SLAVE STATUS” SQL statement.
  17. Decide if you have to rotate/replace the encryption certificate.

Part III: Setting up replication from Aurora to the on-premises database for rollback purposes

  1. Create a replication user “repl_onprem” on the Aurora DB cluster.
  2. Copy the correct RDS intermediary certificate to the on-premises MySQL server and restart the MySQL server.
  3. Verify connectivity to the Aurora DB cluster by using the “repl_onprem” user from on-premise.
  4. Create the replication channel between Aurora as the master and the on-premises MySQL server as the slave.
  5. Verify that the replication process is up and running by monitoring the status on the on-premises database by using the “SHOW SLAVE STATUS” SQL statement.

Onward and upward

After going through a thorough evaluation and migration process, we have decided to standardize on Amazon Aurora as our recommended database at Intuit.

Many internal Intuit teams started leveraging the cookbook and scripts we developed during the TEP migration for their migration to Aurora. With a managed solution like Aurora, the benefits go beyond just cost savings. Our engineers and SRE teams spend anywhere from 60 to 80 percent less time on maintenance, upgrades, and recovery processes, freeing them to spend more time developing the next wave of innovations. Furthermore, Aurora MySQL compatible supports SSL encryption for data in transit and encryption of data at rest. This means one less data security issue to worry about as a DBA. We will never go to self-managed Amazon EC2 database servers.

One of my personal favorites is Aurora storage, which grows automatically up to 64 TB as the dataset grows. Hence, we don’t ever have to worry about guessing storage size, IOPS and managing storage. Knowing the benefits of adopting Aurora MySQL compatible, wouldn’t you rather automate all of this and enjoy your evenings on a beach? We all live to work…shouldn’t this be the other way around?

If you have any questions, thoughts, or suggestions, please leave a comment below. You can also visit the AWS forums for Amazon RDS and AWS KMS. If you want more information about cryptography on AWS, we have documentation available on Cryptography Basics, and a whitepaper on AWS Key Management Service Cryptographic Details. I would like to extend my thanks to Govi Vanakuru from Amazon AWS Professional Services team, who worked hand-in-hand with me during our proof of concept.

 


About the Author

Clement Nappoly is a database administrator at Intuit with 27 years of IT experience. He has been working @ Intuit for the past 12 years, whose role has transformed into SRE role in the past 5 years.  Clement is part of  Intuit Platform organization that develops and harvests the best practices in cloud migration and datastore deployments for all of Intuit.  He is working toward migrating the various databases he supports @ Intuit, to AWS.  He also provides guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.