Database Migration—What Do You Need to Know Before You Start?
Ilia Gilderman is software development manager in Amazon Web Services
Congratulations! You have convinced your boss or the CIO to move your database to the cloud. Or you are the boss, CIO, or both, and you finally decided to jump on the bandwagon. What you’re trying to do is move your application to the new environment, platform, or technology (aka application modernization), because usually people don’t move databases for fun.
Database migration is a complex, multiphase process, which usually includes assessment, database schema conversion (if you are changing engines), script conversion, data migration, functional testing, performance tuning, and many other steps.
Tools like the AWS Database Migration Service (AWS DMS) and AWS Schema Conversion Tool (AWS SCT), native engine tools, and others can help to automate some phases of the database migration process. The key to success, though, is to make your database migration project predictable. You don’t want any surprises when you are 10 terabytes deep in the data migration phase.
So you are eager to start. But do you have all the information required to finish the project successfully?
This blog post discusses common migration problems that can occur when you move your database to the cloud. In the last section, we will discuss some more detailed information related to an AWS migration.
Plan your talent
Application modernization can be challenging, because there are many moving parts and you need to understand all of them. Even if you are an expert with your source database engine, if you switch engines you need to be an expert in the target database engine also. You are going to move your data from one database to another, which will involve crossing the network, so somebody needs to know about servers, ports, and firewall rules. As previously mentioned, database migration is usually a part of a bigger application modernization project, which means the database migration (even for homogeneous migrations) will involve some changes to application code. A common scenario we’ve all heard of is a 15-year-old, 300,000 line COBOL application, developed by dozens (or even hundreds) of developers, and none of them works in the company anymore. But even if you are lucky and you have code written in a modern language by three developers still proudly working for you, somebody will need to assess what changes to the application code the database migration will require. Finally, depending on the cloud provider, you will need somebody on your team who is familiar with the particular provider ecosystem you have. To make a long story short, you will need your best people on this project.
Plan your time
Database migration projects usually include refactoring application and database code, and also the schema, which is a time-consuming, iterative process. The refactoring process can take anywhere from a few weeks to several months, depending on the complexity of your application and database. If you have hard dates (for example, if you need to vacate your data center), be realistic about what you can achieve. Even if everything goes well, data migration might be slower than anticipated. Be aware that the planning can often take more time than the actual migration!
Understand your database
Understanding your source database is crucial to the success of the migration project.
Again, we can go to the extreme and think about a 15-year-old database that can still support the first version of your product, despite the fact that support for this version was discontinued years ago. We can think about the database that has thousands of tables and dozens of schemas and still has these stored procedures which you wrote when you joined the company more than a decade ago…
But let’s say you are lucky and it’s not that bad. Can you answer one simple question: What is the size of the database you are trying to migrate? You might be surprised—most customers can’t answer this question. The size of the database is important when determining the correct approach to use for your migration project. It also helps to estimate how long the data copy will take. What is even more important is how many schemas and tables you are going to migrate. Knowing the layout of your database can help you define a migration project and allows parallel execution of your migration project. For extra credit, you can try to find out which tables participate in which transactions. If you know this, you can execute the migration of multiple disjointed table sets in parallel.
Another important question to ask is: How many very large tables exist in your database? “Very large” is subjective, but for migration purposes, tables larger than 200 gigabytes and with hundreds of millions of rows might end up being the long tail for your data migration.
Okay, suppose you’ve done some initial digging and now you have data about the size, schema, and tables of the database to migrate. The next question you should ask is: Do we have some “fancy” engine-specific data types in our schemas? Having those usually presents a challenge, especially if you want to switch database engines. Even if you are planning a homogenous migration, many tools won’t support all the data types involved. As usual, checking the documentation is a good start. Later, when you migrate your data, verify all your data looks correct on the other end before you shut down the power at your data center.
Now let’s talk about LOBs – Large Objects. If you don’t have them in you database, lucky you, but keep reading to understand how lucky you are. The bad news is migrating LOBs can be slow. Move time can be longer if you don’t know what the maximum size of the LOBs in your schema are. It can be very painful to move even small numbers of them. Another challenge you might face is that you have LOBs in a table without primary keys or any other unique constraints (perhaps your DBA read some cutting-edge blog 10 years ago that claimed this was a good idea). These tables will cause you some pain if you want changes to them to be part of your change data capture (CDC) stream.
Before we talk how not to overload your source database, it’s important to know that working with database roles, users, and permissions can be a huge time-waster during a migration. Most migration tools require some elevated access to the source database catalog to retrieve metadata, and to read logs and replicate changes. Knowing your database roles and permission model, or at least having the name of the person who knows it and can grant access can help.
Finally, even if you get all the types, schemas, and LOBs right, it’s possible the migration will be slow. In some cases, you won’t be able to afford this slowness and it will cause your migration project to fail. A few factors can affect how long the wait will be for the last row to show up on the other end. Some of them are easy to tune and take into consideration; others can be harder.
The first is how hot or busy your source database is. This is important to understand because most migration tools will put some additional load on your source database. If your traffic is very high, it might be unrealistic to plan a live migration and you should consider alternatives. Usually, compacted databases migrate faster and with fewer issues, so you should find out when your database was last compacted. If the answer is never, think about doing it.
Phew—there are a lot of questions to answer, and a lot of people to work with to get the answers. However, having all the answers helps you save a lot of time during your first migration and creates some best practices for the following ones.
Understand your network
The sad truth is that a database migration requires extensive network knowledge from you or your co-workers. It becomes worse if you are migrating a database that is part of your corporate fleet. Most likely, network access to these databases is managed by several teams (security, networking, and so on). You might need to talk to all of them to safely open a port in the firewall or change routing so the replication server can access the database. You’ll also need information about the network setup your organization has with its cloud provider.
Finally, there is this question: Does the connection between your source and target database have enough bandwidth for the migration? Don’t forget, your organization can have an awesome 10 GB direct pipe to your cloud provider, but that doesn’t mean it’s all yours. Make sure that when you start to move data you know exactly how much of this pipe the migration is going to use, and that you are not taking down mission-critical workloads by kicking the tires migrating your test database.
Understand your requirements
I know you want to see that giant server that runs your database or databases go away for good, and sooner rather than later. But do you know what are you replacing it with? Do you know all the key parameters of your new environment? Setting some of these is a new decision to make. Some you’re inheriting from your current environment, or your app dictates them. The simplest questions are: Can your app afford downtime, and how long can this downtime be? In most fleets, some apps can take 20 to 30 minutes of downtime – some can even have a maintenance window of a few hours. Hint: It is better to have these apps as your first project. You won’t be dealing with CDC (change data capture), and you can use dump and restore or other native tools to migrate your database. Another important question that will affect your approach is this: Do you need your source database to remain available after the migration is done? If so, how do you switch back if you need to?
Even if you are doing a lift-and-shift migration, you have decisions to make about your new platform. You have even more decisions if you are switching engines. For instance, do you know why you chose one engine over another? I hope that it’s not because your lead developer told you, “Dude, this is cutting-edge, everybody is using it” and that you had a more structured conversation about why this specific engine is best for your app, your team, and your timeline.
Another important decision to make is whether all the data needs to migrate. For example, you might have an audit log table that is 97 percent of your database. Yes, you need the logs for compliance reasons, but could they be archived later instead of being part of the live migration? Another example: Do you have some dead tables in your database that supported a feature not offered on the new platform?
Even if all the data needs to move, this is a perfect time to examine alternatives for some of your data. For example, can you make use of NoSQL stores or data warehouses on the other end? In general, it’s always good to keep it simple—that is, move the data first, then transform, restructure, or rearchitect it. That said, it might be that refactoring your database during the migration is the only opportunity to do so. In any case, it doesn’t hurt to plan and analyze your data model, even if you are going to change it after migration is complete.
Finally, after all your preparations, thousands of things can go wrong during and after the migration. Having a contingency plan is always a good idea. Can you flip back? Can you run your app on the old and the new database simultaneously? Can you afford to have some data unavailable for a period of time? Answers to these and some other questions help you shape a better contingency plan.
Create your target schema
There are also questions about the target database schema. Even for homogeneous migrations, it’s better to apply the schema in parts. You need to have your tables and primary keys before you can move data. But for performance and other reasons, you might want to create secondary indexes and apply constraints later. You will also likely want to keep your triggers disabled for the initial load of the data. The point is that somebody needs to prepare your schema to be applied in parts. If you are switching engines, the AWS Schema Conversion Tool mentioned earlier can help with some of this. Also, unless there is a pressing need, it’s better to postpone all schema transformations until the end of the migration.
Migrate to AWS
Migration to every cloud has its own specific steps and requires expertise with the specific cloud ecosystem. You need to understand details of networking, permissions, roles, accounts, and so on. Not only do you need to know how the cloud works in general, but you also have to be very familiar with your specific corporate cloud implementation. This section of the post will cover some of the specifics of migrating to AWS and what help is available, and help you ask the right questions about the benefits and limitations of your new environment.
As I said before, AWS has created a number of tools and support materials to help you to be successful with your migration projects. The first and very important tool is the documentation. Please, please read the documentation for DMS and the documentation for SCT. Of course, you know how to set up supplemental logging and you remember most of the Oracle data types and how they map to the rest of the database engines. But please spend some time reading the documentation. It will save you a lot of time and frustration during the migration. The DMS and SCT forums are another place to find useful information before you start the project.
For many use cases, using Amazon RDS for your target database is a natural choice; you get the benefits of a managed service. Make sure you are familiar with those—things like backups, OS and database patches, security, high availability, scalability, elasticity, integration with the AWS ecosystem, and so on. Using RDS frees you from spending time on these activities.
But there are also limitations. Make sure you are familiar with those—things like storage limits, lack of admin privileges on your databases, and so on. You need to evaluate carefully whether you really do require access to your database host. Of course, your DBA will tell you that the world is going to end the moment SSH access is lost to the database box, but will it really? Some apps need this access, but in most cases the sun will shine the day after your DBA loses database admin privileges, as it does for other customers.
Amazon RDS also offers a variety of managed high availability (HA) features. This is a good point to review your HA requirements. Yes, your current setup includes a few MySQL slaves, but why do you have them? Maybe you can do better and more importantly do with less pain, or maybe this is why your database guy wanted to have the host access? It’s important to understand that some of these decisions can be made after the migration. For example, if you are migrating to a MySQL database, it is perfectly fine to migrate to Amazon RDS Single-AZ MySQL (actually faster) and then convert it to an Amazon RDS Multi-AZ instance, or go further and use Amazon Aurora.
Finally, AWS offers a wide list of migration partners that can help you to make an assessment, scope the work, and even execute your migration projects.
Database migration projects can be hard, especially your first one, but the benefits of migrating your database to the cloud are significantly greater than the challenges migrations can present. You can make these challenges predictable and less painful through diligent preparation and collecting the necessary information before you start. Thousands of AWS customers have already migrated with or without our help. Some of them took some shortcuts and hit every possible problem during their first migration; others spent some time preparing and flew through their migrations with zero pain.
The checklist below will help you to ask the right questions before you start.
Thanks for reading! Have a successful database migration!
Database migration checklist
1. What is the size of your database?
2. How many schemas and tables do you have?
3. How many really big tables do you have (200 gigabytes or 200 million rows in size)?
4. What do the transaction boundaries look like?
5. Do you have engine-specific)data types that won’t be migrated by your migration tool?
6. Do you have LOBs in your tables, and how large are they?
7. Do all your tables with LOBs have primary keys?
8. How hot (busy) is your source database?
9. What kind of users, roles, and permissions do you have on the source database?
10. When was the last time you vacuumed, or compacted, your database?
11. How can your database be accessed (firewalls, tunnels, VPNs)?
12. Do you know what VPC you want to use in AWS?
13. Do you know what VPC security group you can use?
14. Do you have enough bandwidth to move all your data?
15. Can you afford downtime? How much?
16. Do you need the source database to stay alive after the migration? For how long?
17. Do you know why you preferred one target database engine over another?
18. What are your high availability (HA) requirements?
19. Does all the data need to move?
20. Does it need to move to the same place?
21. Do you understand the benefits offered by Amazon RDS?
22. Do you understand any Amazon RDS limitations which might affect you?
23. What happens to your application after the migration?
24. What is your contingency plan if things go wrong?