AWS Database Blog
Picking Your First Database to Migrate to Amazon RDS or Amazon Aurora with PostgreSQL-compatibility
David Rader is the vice president of engineering at OpenSCG.
OpenSCG is an AWS Consulting Partner and a leading expert in PostgreSQL, helping customers migrate to, operate, and optimize PostgreSQL and other data platforms on-premises and in the cloud.
In a previous post, we covered the overall approach you can take to migrate your current database to PostgreSQL. When you want to nudge your organization to start migrating from Oracle (or Sybase or Microsoft SQL Server) to PostgreSQL or a PostgreSQL-compatible engine on AWS, you need to choose a candidate for your “first migration project.”
This post helps guide you, as you look across your portfolio of databases (and applications), to think about the various factors that make migrations easy or difficult. Following this guidance can help ensure that your first migration delivers enough benefit to enable your organization to take on a larger migration effort.
Choosing your first database
To determine which database to migrate first, you should evaluate several technical and non-technical dimensions, including the following:
- Risk vs. reward
- Database size and complexity
- Applications and integration
- Organizational readiness
- Business acceptance
These factors determine the level of effort, time, and organizational willpower that is required to complete a successful migration.
(TL;DR) For the impatient among you, the best case is to pick a database that has the following conditions: the database is significant but not mission critical, you have good test plans and test cases, your team knows the code and business logic, the development team is nimble, and the business owner is a good partner. The further you move off this ideal, the more likely it is that your first migration will become overly difficult, won’t impress the organization enough to drive a change, or will outright fail.
If you work in a smaller SaaS provider that has a single big database that is your product, you may only have one database to choose from. But you can still understand what makes the migration process easier. And if you must demonstrate the feasibility of moving one of your enterprise’s largest and most complex databases, you might need to do a “proof of concept” before your first migration for production use.
Let’s look at each of these dimensions and how they impact the choice of a first database to migrate to PostgreSQL.
Risk vs. reward—the Goldilocks objective
You may think that you should first migrate a small database with low traffic and very little business risk, since there’s very little downside if the conversion fails. A successful migration of such a small database may demonstrate to you and your technical team that a migration is possible. However, the result typically doesn’t generate enough excitement or demonstrate enough benefit from lower database license costs (if it’s so small, you could probably run the free version of a commercial database). There’s also no benefit from personnel savings, since your DBAs don’t spend much time supporting it anyway. Your migration “success” faces a good chance of being ignored. So, cross out “really small and simple” as the first choice.
But the opposite extreme—your most mission-critical, customer-facing, revenue-generating database—is a bad choice too. This database is probably one of your most complicated and time consuming to migrate technically. In addition, the planning, testing, and organizational buy-in that’s necessary to actually go-live with a migrated version of this critical database means that the project will likely take multiple quarters, have a high chance of delays, and possibly be stopped because the risks are too great. If you fail in migrating such a high-profile database first, you might have a difficult time getting approval to try again until the memory of the pain has dissipated over time.
You’re looking for a medium-complexity database that has some importance to the business so that your organization pays attention to the migration results. But it should be small enough that you can complete the migration within one quarter—or at most two—and demonstrate the benefits from the migration. Ahh… just right.
Database size and complexity
The first dimension that DBAs and most technology managers think about is the size and complexity of the database. We are talking about a database migration after all! Earlier, we said that we are looking for a medium-complexity database—how do we determine that?
We measure the size of the database based on the following:
- The number of schema objects (tables, indexes, foreign keys, and constraints)
- The number of code objects (stored procedures, triggers, views, functions, and custom types)
- The number of rows and total disk space
- The volume of activity on the database, with a focus more on changes (inserts, updates, and deletes) than on reads
You can migrate most of the schema objects automatically using a tool such as the AWS Schema Conversion Tool (AWS SCT). Or, for an Oracle to PostgreSQL migration, you can use the open-source Ora2Pg project. However, you still must review and validate the migrated objects. For optimal performance on larger databases, you also need to tune some of the mappings and index types.
Some code objects can be migrated automatically from Oracle PL/SQL to PostgreSQL’s PL/pgSQL and reviewed. But in more complex databases, many objects (40–60 percent) must be manually edited or rewritten. This might be because specific functions or packages that are used in the PL/SQL code don’t have an exact match in PostgreSQL. Or it might contain complex SQL syntax that the tool can’t rewrite to PL/pgSQL. You might also have procedures that require manual performance tuning and rewriting—whether by rewriting the query so that the PostgreSQL optimizer can generate an efficient execution plan or by using the pg_hint_plan extension to guide query execution.
Migrating data is easy for standard types and slowly changing data. Custom types require more mapping and sometimes new views or code added to the Oracle database to convert to simpler types that can be easily migrated to PostgreSQL. You can use the AWS Database Migration Service (AWS DMS) to migrate both the initial data load and updates on the source database to synchronize the data to PostgreSQL. For large data sizes that are moving from an on-premises database to AWS, migrating the data over the network with AWS DMS may take too long. Using AWS Snowball to migrate a large portion of your historical dataset may be required.
In general, for your first migration, choosing a database that contains dozens or low hundreds of tables is good. Choosing a database with tens of thousands of tables is bad. A database that has dozens of functions and stored procedures is good, but one with thousands of functions is bad. Having tens to hundreds of millions of rows is good—tens of billions of rows is bad. And having tens to hundreds of gigabytes is good—multiple terabytes is bad.
When I say, “bad,” I mean, “will take longer and be more complex, and so it’s not a great choice for your first migration project.”
Applications and integrations
As part of your database migration, you have to modify the applications that use your database. These could be user-facing mobile, web, or desktop applications. They include any data-integration feeds in and out, web services, REST APIs, batch processing jobs, ETL (extract, transform, and load), or reporting applications, and administration consoles for each of these apps. If you follow a service-oriented or micro-services architecture approach, you might have only a single application that reads and writes to each database. If you have a sprawling enterprise application suite that uses a shared database as the primary integration mechanism, you might have hundreds of separate applications that access the database. When you choose your first database to migrate to PostgreSQL, make sure that you have a manageable number of applications to modify.
PostgreSQL has support for most development languages, including Java, C#, Python, PHP, C/C++, JavaScript/Node.js, Go, and more. So the specific technologies that are used by the applications are not critical. But you should ensure that you or your application development partners have the full source code, the working development environments, and the ability to deploy and test modified versions of the applications that are modified as part of the migration. Of course, knowledge of the application code base and business logic also makes the migration easier and less risky. So, picking a database that’s used by an actively maintained Java application is a better first choice than a database that’s primarily used by legacy C code that has not been touched in 10 years.
For out-of-date legacy technologies (e.g., COBOL) or proprietary application technologies (e.g., Oracle Forms), you might decide to rewrite the application, and not simply migrate the database interaction. This is an important modernization effort to lower costs and ensure that the application can be maintained by staff for the next 10 years. However, that becomes a complete application modernization rather than simply a database migration. This adds cost and risk that you probably don’t want for your very first PostgreSQL migration.
The specific application architecture also impacts how easily the app can be migrated. AWS SCT can help convert some of the SQL in the application code, but every change must be reviewed and retested. If the application uses a layered approach with a separated data access layer or an object-relational mapping (ORM) with externalized SQL strings, you only need to edit those specific code modules. But if the application has SQL queries scattered throughout the code base, your team may be searching and changing hundreds of separate files—increasing the effort and risk of missing necessary changes. This is usually not something you know or can control when choosing your first database to migrate. But if you have a choice, choose the database that is used by well-architected applications.
Packaged applications
If you are using a packaged software application whose vendor does not certify on PostgreSQL, migrating is probably a non-starter. Often when the vendor supports multiple commercial databases, the application is designed in a cross-database way, uses highly standards-compliant JDBC or ODBC calls and SQL queries, and minimizes the amount of code in the database. It may be tempting to port the application yourself, and I have worked with companies that chose to “crack open” the database code of an application. Usually they could make the application work, but then they were running an unsupported, modified version of the application. Upgrades, even small point releases, became a testing nightmare, and major versions were out of the question. Not to mention that you might be violating the terms of your license with the application vendor.
If you really want to move your commercial ERP/CRM/accounting application from Oracle to PostgreSQL, you might have to get your vendor to add PostgreSQL to their supported database list or migrate to a new business application. This is not an issue for applications that you control and have the source code for.
Testing
Testing is a large part of a migration project—in fact, testing is often the largest effort involved. You must test every function, stored procedure, application-calling interface, report, data load, data feed, batch job, and ad hoc support team query. You also have to verify that not only did the application behave correctly after it updated the database, but that the data in the database is correct. And every combination of data values and corner cases must be exercised and checked in every report and feed.
A difficult aspect of database migrations is that a subtle difference in the data—for example, a field that used to have decimal values but is now being rounded off to an integer value—may not appear in the user interface or a report right away. Without proper testing, the differences may not be noticed for several months after the switchover, leading to massive headaches for data cleanup and reconciliation.
Based on the testing requirements, choosing a database and applications that have strong unit testing, automated test cases, and a robust, full regression test plan will make your first migration project much easier. When you migrate a database that doesn’t have strong unit tests, you need to build a set of comparison tests that validate the migrated database against the original (along with database functions, data feeds, reports, etc.), which adds to the cost and time required for the migration.
Organizational readiness
Migrating to a new database can be a big change, especially for DBAs and application developers who may only have experience with a single database technology. Many of the SQL syntax, commands, and processes are very similar. PostgreSQL’s PL/pgSQL stored procedure programming language is similar to Oracle’s PL/SQL. But there are subtle differences, old habits must be changed, and new tools must be learned. To learn more about migrating your PL/SQL code to PL/pgSQL, check this blog post. Picking a database that is supported by teams with experience in multiple databases or who are willing and eager to learn new technologies can make the first migration project much more successful.
Of course, if you have a good candidate database to migrate but the team needs help learning a new technology, you can jump-start your efforts by training your team and using a partner that has the right skills.
Business acceptance
Migrating from Oracle to PostgreSQL has a definite business benefit, but it’s usually “behind the scenes”—not new application features. Specifically, you save license fees (sometimes very large license fees), and you gain a lot of flexibility from the open source PostgreSQL community and extensions. With Amazon Aurora for PostgreSQL, you can benefit from improved cloud scalability and performance. The financial ROI alone is typically a 12-month payback or faster. But the migration effort consumes a large amount of your resources—for planning, analysis, migration, and testing—without delivering any new features or capabilities for your business owners.
The first database migration you choose needs to have business partners that understand the benefits and can accept a slightly slower pace of features during the migration effort. If the database and associated applications that you select are super-fast changing with very large investments in new features, you must manage your business partners and probably leverage a partner to complete the migration without significantly slowing down feature development.
Examples of good candidate databases for migration
Typical application types that are good candidates for migration include online web application databases, internal customer service or business process databases, reporting data marts, online transaction processing (OLTP), and mixed workload operational data stores (ODS). For your first migration, line-of-business, non-mission-critical applications usually offer a good combination of complexity, control, ability to change, and cost savings.
Summary
As you can see, choosing your first database to migrate is important to the success of your PostgreSQL migration efforts. Consider the factors in this post, and plan your migration for maximum technical and business benefit and a short enough timeline to keep your efforts on track.