Migrating to Amazon Aurora: The View from the Other Side
Today I have a new guest post, this one written by AWS customer Safe Software. They were an early adopter of Amazon Aurora and have a great story to tell.
In addition to leveraging the power of AWS and Aurora for our clients, we also evaluate new technologies from the perspective of improving our internal processes. When Aurora was released in beta, we immediately thought of migrating to it for our own systems. That decision has proven to be worthwhile. The move to Aurora has increased our productivity while providing an annual 40% cost reduction in systems costs.
Now that the migration is behind us, I’d like to share some tips and insights to those considering taking the leap. While I’d like to include the migration details, there is not much to say as it only took the click of a button. Instead I will share what we tried first, how we prepared, and how we optimized our systems further once they were operating in Aurora.
Why We Needed The Cloud
To ensure high quality of our spatial data transformation technology, FME, we run a grueling automated test suite. The platform supports 365+ data formats and limitless transformations, making the automated daily testing demanding: 15,000 tests x 4 operating systems x 3 products, running 24/7.
The problem: We couldn’t keep provisioning hardware to keep the system happy. Never mind keeping up with our expectation of a 1-second response time.
Our internal production database runs on a high traffic system with 140+ tables containing ~100 million rows of data. It is the primary operational repository for our build and test systems, as well as our internal intranet and reporting framework, supporting our server farms upwards of 150 machines. Over 100 users rely on this system.
What We Tried Before Aurora
We initially tried moving everything to MySQL on RDS, but found that we needed to run a read replica on a sizable instance to manage load. Even still, we were crowding ourselves against the productivity ceiling for the number of connections we could handle for most queries. We struggled to meet the needed response times. This solution had also immediately doubled our costs.
We’d spent so much time getting good at MySQL that the idea of having to relearn all of that in a new system was painful. Having something you treat like an appliance is much better.
Fail-Safe Preparations and Migration
We heard Aurora mirrors the MySQL experience, so we figured it was worth trying. To ensure we had nothing to lose we decided to keep the production system running in its existing form, while we tested Aurora.
One of the benefits of moving to a higher performance system is you have a good opportunity to re-assess a system that dates back years. During this migration we did some housekeeping. We looked at indexes, table structures, and many other relational aspects of the database. We were able to combining multiple schemas into just two for simpler logic.
The actual move into Aurora was trivial. Within the Amazon control panel, we chose to migrate – clicking a button. It happened in the background, and we ended up with a copy in Aurora of what we had running in MySQL! It was that simple!
Managing the cutover is a fairly big thing in terms of scheduling, to make sure we’re not impacting operations and meanwhile capturing a current snapshot of the data. We were wary that things could get out of sync; that by the time the migration was done the read dates may be out of date. We knew it would take a few hours to migrate the production system that was still operating, and during that time, data could change.
We chose to do the migration overnight on the live system while it was still running. We followed up with our own FME product to capture changes that had taken place in volatile tables during the migration (about 2-3% of our data), and port them over.
Our build and release team was able to manage the migration ourselves, and only involved the IT department to configure identity and access management and then change the DNS on our network once we’d verified that everything was a go.
We had checked a few examples for sanity first, but it was kind of a leap in the dark because we were early adopters. But we knew that we could just roll back to the old system if needed.
Optimizing The Experience Post-Migration
We thoroughly tested all of our processes afterward. There was some head-scratching after the first couple of days of monitoring; we experienced patches of heavy CPU load and slow-downs in Aurora during operations that had previously been unremarkable in MySQL.
We tracked these down to a set of inefficient queries using deeply nested SELECTs which were not readily modifiable. We resolved these issues with some simple schema changes, and pre-canning some of the more complex relationships using our own product, FME. Bottom Line: Schema design is still important.
No other issues were experienced during or since, and tuning these queries and indexes was ultimately beneficial. In operation we now have enterprise scale with the familiar interfaces we are used to.
For almost all operations, Aurora has proven faster, and it gives us more scalability. We’re running a relatively modest setup now, knowing that we can expand, and it’s saving us about $8,000 per year (60% cheaper). In fact, we could double our performance using Aurora and it would still be less than we paid last year. We save another $2,000 by reserving the instance for annual operations.
Operation management is pretty critical stuff, so it’s a relief not to worry about backups or database failures. The managed database saves us a lot of headaches. To achieve this performance ourselves would have a required huge investment in both hardware and personnel.
With Aurora, we can create our FME product builds better, faster, and the test results come through quickly, which ultimately means we can provide a higher quality product.
— Iain McCarthy, Product Release Manager at Safe Software Inc.