Microsoft Workloads on AWS
How Cloudsoft helped ASP modernize MS SQL Server workloads to Amazon Aurora to save costs and reduce operational overhead
By Aled Sage – VP Engineering at Cloudsoft
Prasad Rao – Sr. Partner Solutions Architect at AWS
ASP, an industry leading event website business, needed to modernize its Microsoft SQL Server-based Content Management System (CMS). As the company continued to evolve and grow, so did its data. ASP’s Showoff application, which provides goal-driven website design to host 400+ websites for 170+ clients across the globe, had eventually grown too large for its propriety SQL Server database.
ASP’s growth was expected to continue at 30% per year, but its databases couldn’t scale to meet that growth.
ASP had previously migrated its Microsoft SQL Server workloads from a datacenter to Amazon Relational Database Service (Amazon RDS). However, the company wanted to continue modernizing to continue saving costs, which included expensive SQL Server licensing costs. ASP also wanted to reduce its operational overhead of maintaining its database servers. To address these concerns, the ASP team worked with APN Consulting Partner Cloudsoft to modernize its Showoff CMS onto a purpose-built, cloud-native database.
In this blog, we explain how Cloudsoft helped ASP reduce costs by 65% on SQL Server licensing and simplify its operations by modernizing from Amazon RDS for SQL Server to Amazon Aurora. We walk through ASP’s pre-migration architecture, migration steps, and results of the database modernization.
Pre-migration architecture
The following diagram shows ASP’s previous SQL Server architecture on Amazon RDS.
The databases were sharded across four Amazon RDS for SQL Server instances, with each instance configured for high availability across two Availability Zones. ASP used multiple SQL Server instances for its hundreds of databases. This is because each instance can handle up to 50 databases per RDS SQL Server instance.
However, using multiple SQL Server instances led to higher licensing costs. It also increased operational overhead for engineers: it was complex to manage tasks like troubleshooting, maintaining runbooks, handling upgrades, and configuring disaster recovery across all four SQL Server instances. Moreover, this also increased time for onboarding new customers. Adding new customer websites required involvement from the engineering team. This is because the team had to manually choose a SQL Server instance based on capacity to configure each application that ran on the database. There were also occasional performance issues for some usage patterns, consuming high CPU on the database and impacting response times.
ASP’s migration
Cloudsoft and ASP began planning for the migration by conducting a proof of concept. This involved trying to move a copy of the data from SQL Server into Aurora, reconfiguring the application to use the MySQL database driver, and running application tests against the Aurora database. By working through some of the initial incompatibility errors between SQL Server and Mysql, it helped the team identify application changes and database changes. Ultimately, working through errors improved ASP’s understanding of the effort it would take to migrate.
During the assessment and testing phases, we found that MySQL 5.7 was the best choice for ASP due to compatibility requirements with the application server’s technology stack. ASP chose Amazon Aurora because Aurora is up to five times faster than standard MySQL databases, and provides the security, availability, and reliability of commercial databases at 1/10th the cost.
Schema and code changes
Before we could move the databases, we converted the database schema to migrate our data smoothly across databases. We used the AWS Database Migration Service (DMS) to create the initial schema in Aurora, converting from SQL Server to MySQL.
For the changes that DMS did not handle, we used hand-crafted scripts and the Hibernate ORM to auto-update the schema. We manually set default values, created views and changed character encodings and column types to ensure compatibility. We also added scripts, foreign keys, and secondary indexes after the data was migrated. This was vital because referential integrity checks were disabled when migrating data table-by-table, and the data import was faster when secondary indexes were disabled.
To make ASP’s application compatible with the new database, we manually made about 11,000 minor code changes over the course of 3-4 months. This included handling case-sensitive table names, rewriting stored procedures as SQL queries, handling data type conversions from Microsoft SQL to MySQL, and updating SQL queries to make them MySQL-compatible.
Planning database capacity
To choose the instance sizes for the Aurora production cluster, we followed one of AWS’ Well- Architected Framework design principles: stop guessing capacity. Since we knew that we could always decrease the database size in the future, we chose a larger instance size pre-migration.
Next, we tested the application in staging with a realistic workload, paying particular attention to the slow query log and the metrics shown in Amazon CloudWatch. We began testing with an Aurora instance size that was the sum of the four Amazon RDS for SQL Server instances (four times the size of each individual Amazon RDS for SQL Server instance). The application ran well on Aurora, which meant that we could focus on our next test.
Then, we led a comparison with larger and smaller database sizes. We tested r5, m5, and c5 instance types for projected traffic and substantial data growth. After testing, ASP determined that the larger r5 instance size provided the best performance for customers based on expected traffic spikes.
The new Aurora instance performed excellently both in testing and post-migration stages. It handled peak production load, with sufficient headroom for expected traffic growth. This peak load was over 3000 database queries per second, averaging 140 million queries per day. Additionally, previous intermittent performance issues were resolved. Finally, website traffic had less relative impact on the total load because one Aurora cluster could handle 400+ websites.
Migrating data across databases
We migrated the data from SQL Server to Aurora using AWS Database Migration Service (DMS). We did this early in the migration process to test against a copy of production data. Then, we repeated this test multiple times to gain confidence for our go-live cutover.
We could put the database in read-only mode for the required length of migration time. Because of this, we only used DMS for the “full load” (or, when we copied the data from a certain set of databases). Using continuous replication would have added unnecessary complexity in this particular scenario. If the usecase would have been to have the database available for both read and write then continuous replication (though complex) would have been implemented.
Through our testing, we established that we needed provisioned IOPS SSD during the data migration (otherwise we would exhaust our burst credits). After the migration, we reset our storage type back to General Purpose SSD.
Testing to ensure migration success
As mentioned, we executed extensive testing before the migration. We tested the data migration and cutover process repeatedly while the application was in the staging environment. Along with testing the existing automated tests and standard QA scripts, the whole team did exploratory testing on operational runbooks and updated them accordingly. We repeated this test multiple times before our migration.
When we encountered errors (such as an incompatible SQL function), we added the incompatibility to a list that we shared with developers and testers. All occurrences of the incompatibilities were fixed and tested. We then searched our codebase for the incompatible pattern to ensure it was fixed.
We agreed that our post-migration strategy was to fail-forward if our customer reported any issues that didn’t involve data loss. That is, we agreed to fix all issues in the new Aurora deployment instead of failing back to our old database. Thankfully, we didn’t encounter any significant issues post-migration.
Following Aurora best practices to achieve operational excellence
We updated our monitoring and alerting to follow Aurora best practices. First, we updated CloudWatch metrics and alarms to track system health and performance. Then, we updated subscriptions to Amazon RDS Events to provide notifications across changes, errors, Amazon RDS performance insights, and new usage of MySQL Workbench. This would help us manually investigate any new updates to our databases.
Most of our disaster recovery processes with Amazon Aurora were similar to those with Amazon RDS for SQL Server, which made our migration more fault-tolerant. With both Aurora and RDS, we could use snapshots and point-in-time-recovery. In addition, Aurora also supported Backtrack in case we needed to execute an emergency point-in-time rollback of our existing database. This was important because some processes for database backup and restore were impacted by the migration. This included native export/import of specific databases when we copied them into the staging environment.
Operationally, we updated our runbooks and playbooks as we learned new tools like MySQL Workbench. Additionally, we ensured operational excellency by using CloudFormation to provision and manage the Aurora configuration. Importantly, onboarding new customers no longer required that the engineering team manage the allocation of customers to RDS instances.
Executing the final cutover
For the final cutover, we conducted a read-only zero-downtime maintenance window of 12 hours over a weekend. This gave sufficient buffer for us to be confident in our migration. The migration would not only be completed on time, but would also have a lower impact on customers.
For the cutover, we executed the following high-level steps:
- Put the sites into read-only mode
- Execute the following steps for each batch of customer websites (divided into four batches):
- Using DMS, migrate the data
- Run SQL scripts to update the schema and set default values
- Start ORM (Hibernate), to auto-update the schema
- Rollout the frontend tier with the new application code (using CloudFormation), pointing to the new Aurora cluster instance
- Run tests against the frontend
- Roll out new application code for the admin & API components
- Run tests against the admin / API
- Re-enable write-access
After we migrated all of our databases and ensured that our applications pointed to the new Aurora cluster, we wanted to stop our Amazon RDS instances. This is so the instances would no longer incur running costs. We kept the instances for a week in case of inconsistencies or customer issues on Aurora. Then, we terminated the instances once we felt confident in our new databases.
Post-migration AWS architecture
The following diagram illustrates our new architecture (again, slightly simplified):
Post-migration results
By migrating from Amazon RDS for SQL Server to Amazon Aurora, ASP was able to optimize cost, reduce operational overhead, improve scalability, and increase application performance. The database CPU no longer peaked at 80-90% on a regular basis. Instead, peaks were closer to 50%. This gave ASP headroom for further expected growth and helped the company continue meeting its objective of running leading performant websites in the events industry.
“Migration from SQL Server to Aurora was a great move for us, and Cloudsoft’s help was invaluable. The 65% cost savings on SQL Server Licensing are obviously very welcome, as is the simplified configuration of using a single Aurora cluster. Its high availability and high performance help us to deliver the excellent quality of service our customers expect from ASP” – Dan Pacitti, CTO at ASP.
Heterogeneous database migrations are not always easy, but with Cloudsoft’s help and guidance, ASP was able to evaluate, plan, implement, test, and go-live. ASP continues to work with Cloudsoft to further modernize its applications in AWS.
By modernizing to cloud-native databases, you can gain the scalability, efficiency and cost savings of the cloud. If your organization is using Microsoft SQL Server or other old-guard technology with high licensing costs, engage with Cloudsoft to modernize your workloads.
AWS can help you assess how your company can get the most out of cloud. Join the millions of AWS customers that trust us to migrate and modernize their most important applications in the cloud. To learn more on modernizing Windows Server or SQL Server, visit Windows on AWS. Contact us to start your migration journey today.