Challenges When Migrating from Oracle to PostgreSQL—and How to Overcome Them
David Rader is the vice president of engineering at OpenSCG.
OpenSCG is an AWS Consulting Partner and a leading expert in PostgreSQL, helping customers migrate to, operate, and optimize PostgreSQL and other data platforms on premises and in the cloud.
In previous posts, we looked at the overall approach for a successful database migration and how to pick your first database to migrate. In this post, let’s look at a few migration challenges and tips to overcome them.
When organizations look to migrate from Oracle to the open source PostgreSQL database, they usually focus on the advanced features, high performance, rock-solid data integrity, flexible open source licensing, and easy availability from public cloud providers, including AWS. But the road for migration is not always smooth, and knowing how to avoid the potholes can help ensure that your migration is successful.
The challenges I write about in this post have impacts that range from convenience and ease of development going forward, to decreasing performance of the migrated database, to potentially making a migration impossible. Since it’s the scariest one, let’s start with the challenge that might just make migrating an existing application impossible.
If you want to move your commercial ERP/CRM/accounting application from Oracle to PostgreSQL, you might have to get your vendor to add PostgreSQL to their supported database list, or else migrate to a new business application. But this is not an issue for applications that you control and have the source code for.
Data types and schema conversion
Now that we’ve covered the biggest challenge, let’s go back to one of the easiest. PostgreSQL has a comprehensive set of built-in data types and support for custom extension types. The core Oracle data types can be mapped easily to PostgreSQL types. Most of the mapping and conversion of tables can be automated using the AWS Schema Conversion Tool (AWS SCT) or another migration tool.
Some PostgreSQL data types are much easier to work with than their corresponding Oracle types. For example, the
Text type can store up to 1 GB of text and can be handled in SQL just like the
varchar fields. They don’t require special large object functions like character large objects (CLOBs) do.
However, there are some important differences to note. The
Numeric field in PostgreSQL can be used to map any
Number data types. But when it’s used for joins (such as for a foreign key), it is less performant than using an
Timestamp with time zone field is slightly different from the Oracle
Timestamp with time zone field—it is actually equivalent to Oracle’s
Timestamp with local time zone. These small differences can cause either performance issues or subtle application bugs that require thorough testing.
After you convert your schema and tweak the tables to best meet your specific application, it’s time to move the data. For smaller (100 GB or less) databases, this process is fairly straightforward. Using AWS Data Migration Service (AWS DMS) or a tool such as HVR, you can create data migration jobs that run on Amazon EC2 instances, connect to your local Oracle database, and pipe the data to an Amazon RDS for PostgreSQL instance. You will want to verify the data in the target database. Then run the migration a couple of times in dev, test, and finally, your production environment, and resolve any issues that arise.
When you are comfortable with your migration process, run through the process again in production. Migrate the bulk of existing data and catch up to the currently changing transactions using an AWS DMS replication instance. Then keep the migration service running to ensure that the target database remains in sync with changes in the live system.
Larger data volumes
For larger data volumes—more than 1 TB, for example—a purely online migration of data might be too time consuming or take too much of your available bandwidth. In this situation, you should use an export, reload, and sync approach. In this approach, you export your largest tables. Then, you either compress and push them to Amazon S3 or use AWS Snowball to transport them to an AWS Region and load to Amazon S3. After the data is in Amazon S3, you can bulk load the data files to PostgreSQL.
Using date-based partitions makes it easier to choose and isolate the data that is moved offline. Unchanging historical data that has time stamp–based records can be exported up to a known point in time, with any data after that point migrated using AWS DMS. Or, you can use an Oracle log sequence number (LSN) or application-specific sequential ID as the cutoff point for the export and migration job.
A migration is a great time to clean up certain parts of your architecture and application. For example, if you store files (PDFs, images, etc.) in your database, it’s a great time to break them out into their own reliable Amazon S3 storage bucket. Reduce the size of the database and the time needed for backup and restore operations, and improve your application flexibility by being more flexible in how you work with files.
If you have static historical data, you might be able to either purge it completely if it’s never used by your application, or move it to a separate archive data store. This data store can be either in a lower performance (and less expensive) PostgreSQL database or perhaps in an Amazon S3-backed Amazon Athena or Apache Spark system that is suitable for infrequent queries on cold data.
And, if you’ve mixed online transaction processing (OLTP) and analytics-style data access, moving from a one-tool-for-everything Oracle setup to using a separate warehouse for reporting and analytics can improve both your application responsiveness and your analytics capabilities. There are options to create a dedicated Postgres-XL–based warehouse or use Amazon Redshift as a powerful managed warehouse.
In all, migrating your data takes planning and practice, but is a very solvable challenge.
The most intensive effort during an Oracle-to-PostgreSQL migration is usually porting the code to work with PostgreSQL. This applies to the stored procedures, packages, and functions within the database and the application code that reads and writes to the database.
PostgreSQL is similar to Oracle in many ways. The main procedural programming language, PL/pgSQL, is similar enough to PL/SQL that most database administrators (DBAs) and developers can easily learn the syntax. Automated tools such as the AWS Schema Conversion Tool (AWS SCT) or the open source Ora2Pg typically can automatically convert more than 70 percent of the database code to work correctly.
The automated conversion includes converting Oracle-specific functions to ANSI standard functions (for example, moving from
coalesce()), changing legacy syntax such as the plus sign (+) used for outer joins to standard outer join syntax, and adding aliases for subselects that Oracle considers optional, but that PostgreSQL requires. In these cases, the converted code is compatible with both Oracle and PostgreSQL. It could actually be merged into your current app code base and deployed on your existing database before you fully migrate.
The automated tools can also convert code that uses database-specific syntax, such as sequence handling with Oracle’s
sequence.nextval vs. PostgreSQL’s
nextval(sequence), or the slightly different syntax for executing dynamic SQL within a stored procedure. These changes are not compatible with Oracle and so are only used on your target PostgreSQL system.
The remaining code must be manually converted, either because the SQL and PL/SQL were too complex for the tool to parse and convert perfectly, or because there is not an exact one-for-one conversion that can be automatically applied. Many such cases are trivial for experienced developers and DBAs to convert when they understand both the PostgreSQL approach and the application logic involved.
Niche features: autonomous transactions, collect all, and BFILEs
There are some Oracle features that are either not supported by PostgreSQL or handled through an extension or workaround. For example, PostgreSQL does not directly support autonomous transactions that enable one stored procedure’s changes to be committed inside a larger transaction that is rolled back. The common workaround is to use a DBLink “remote” connection to the same database. You execute the function you want committed as a “remote” call that is treated as a separate connection—and a separate transaction.
Similarly, core PostgreSQL does not support externally organized tables or accessing files on the database file system directly. However, both of these are possible using a Foreign Data Wrapper (FDW).
Some common Oracle packages, such as DBMS_OUTPUT, are supported by the open source orafce and AWS compatibility extensions that ease the conversion cost. Others, such as UTL_FILE, UTL_HTTP, and the SMTP package, could be supported via extensions. But you should review whether this is the right architectural solution, or if your application architecture should be updated as part of the conversion. You need to verify whether the extension you’re considering is supported in the Amazon RDS-managed environment, or you might not be able to use Amazon RDS.
As noted previously, PostgreSQL has broad support in programming languages, and there are drivers available for every mainstream development language, and many others as well. Depending on your application architecture and database connection layer, you might need only a small amount of change, or you might have significant porting changes.
For example, it’s fairly easy to convert a Java-based application that uses generic JDBC classes (not Oracle-specific classes) and always calls stored procedures with no SQL in the code, either dynamically built or hardcoded. If you use an object-relational mapping (ORM) such as Hibernate or JCA, it could be trivially easy to switch the dialect from Oracle to PostgreSQL. There are still little differences. For example, if you use partitioning, the required PostgreSQL triggers change the record count that is returned from an insert and fools Hibernate. So you need to add annotations to the Hibernate queries that update partitioned tables.
For these types of simple cases, you can even use AWS SCT to scan your Java or .NET code to find SQL statements and convert the SQL to PostgreSQL-compatible SQL—similar to the conversion done for PL/SQL.
However, the application code conversion is much more involved if you use embedded SQL such as Oracle’s Pro*C, link to Oracle-specific libraries such as OCI or the Oracle JDBC classes, or dynamically build SQL based on application conditions. PostgreSQL has support for C/C++ with either embedded SQL (ECPG) or the libpq library. But these are not drop-in compatible with the Oracle equivalents.
Similarly, changing application code that dynamically builds SQL requires a strong understanding of the application logic and robust testing to ensure that the functionality works as expected. We have worked with customers to automate some aspects of these migrations, but it still requires focus from the application maintenance team.
Transaction control and exception handling
Every application must ensure proper transaction management and error handling—that’s how we keep the corner cases, run-time failures, and unexpected user input from creating bad data. PostgreSQL has robust transaction handling, supporting full ACID semantics and different isolation levels. PostgreSQL also handles run-time errors gracefully and gives reliable error codes and messages to calling code—PL/pgSQL or applications. But there are a couple of differences in the way PostgreSQL deals with these internally versus how Oracle behaves that require changes to application code or design.
First, PostgreSQL does not allow transaction control inside of PL/pgSQL—you cannot commit or roll back a transaction inside a stored procedure. The application that calls the stored procedure must perform the transaction management—starting and committing or rolling back. The stored procedure executes within that calling transaction context. Obviously, if your existing database code has transaction management in procedures, it must be modified.
Second, when a run-time exception has occurred in a transaction, you must roll back that transaction before you can execute any statement on the connection. You often see this when you find the following error message in your application log:
ERROR: current transaction is aborted, commands ignored until end of transaction block.
This message indicates that an error occurred, the error was ignored, and another statement (
EXECUTE, anything…) was executed Then the second statement failed because the transaction is already in an error state (aborted). When you see this message, carefully review your database calls and exception handling. Ensure that anywhere an error could occur (any database call), you check errors or have an exception handler set up and
ROLLBACK to a savepoint, or close the connection) before trying another database operation.
Third, for application logic and to address the above error, you must have exception handling. In PL/pgSQL, using a
BEGIN…EXCEPTION…END block lets your code catch an error that occurs. This block automatically creates a savepoint before the block and rolls back to that new savepoint when the exception occurs. You can then determine what logic to execute based on whether there was an error. However, exception blocks, because they create a savepoint, are expensive. If you don’t need to catch an error, or if you are planning to simply raise the error back to the calling application, don’t have the exception block at all. Let the original error flow up to the application.
Similarly, Java, embedded SQL, and other languages have mechanisms for catching exceptions. Review the application to ensure that proper error handling is in place around database calls. If the application currently catches and ignores the exception, you must modify it to roll back the transaction before a new database call can be issued. If the application expects to keep the partial transaction changes before the exception, you might have to add savepoints to your application code and roll back to a savepoint to continue after an exception.
Note that just like the exception block in PL/pgSQL, adding savepoints does affect performance. So use them where needed, but not in every database call. For example, suppose that you are saving a header record along with child records, and you will roll back the entire transaction if you get an exception. In this case, you don’t need to create a savepoint after inserting the header record because if the child records fail, you roll back the insert.
Finally, you must map the error codes and exception types that your application expects from Oracle to PostgreSQL. Some error codes (such as the 100 no record found code) are the same on both, but others are different. Depending on your programming language, if you are catching Oracle-specific JDBC exceptions, you must replace those specific exceptions with either generic cross-database exceptions or switch to PostgreSQL-specific exceptions.
Making sure that your applications handle transactions and errors correctly is a critical part of migrating and usually requires changes to the database and application code.
Over the years, your DBAs and developers have built up a lot of experience in your current technologies. This experience includes using the various advanced features, tuning the performance, and keeping the system stable and running. Migrating to a new technology requires learning new skills, developing new ways of approaching problems, and giving up old ways of doing things. Don’t underestimate the amount of time required for a highly experienced DBA to become comfortable supporting a mission-critical system on a new platform.
You also must realize that, although Oracle and PostgreSQL are both relational databases and support most of the same SQL syntax for creating tables and querying data, the internals are different. So they behave differently in some situations.
For example, PostgreSQL’s Multiversion Concurrency Control (MVCC) is very different from Oracle’s rollback segments, even though they both provide the foundation for ACID transactions. Developers who are used to designing applications for one, or DBAs who are used to optimizing performance for one, might hit some nasty speed bumps if they use the same techniques on the other.
Training your team on PostgreSQL early in the migration process pays for itself. They will be more comfortable executing the migration and will be able to deliver a working application and database more quickly. Plus, they might be able to avoid some of those nasty speed bumps if they understand the differences before they start converting their code.
A key requirement for a successful migration is to not disrupt your business. This means you have to ensure that your data is migrated correctly, the application behaves correctly, and the performance provides a good user experience. It also means reducing the amount of downtime when your application is not available for users or customers.
These steps help reduce the time required for the production cutover, from archiving old data, to using AWS DMS to migrate and keep ongoing transactions replicated, to ensuring that your code is thoroughly tested. By scripting the deployment steps and rehearsing them several times in your test and staging environments, you can ensure that your team is well prepared for the production switch. And using AWS resources means you can start and stop those test environments without incurring the full expense of a large database server installation.
Synchronize the target DB
If you migrate all the data using a combination of offline and replication approaches (like AWS DMS), you can have a fully populated, ready-to-use copy of your migrated database before you start your cutover process. You won’t need to go through the scary exercise of calculating how fast you can convert your 1 TB database, or provide ongoing updates to management about the number of tables migrated during a Saturday maintenance window.
Tips to speed your migration
In this post, I discussed some of the issues you might encounter as you migrate from Oracle to PostgreSQL, along with some tips for overcoming the challenges.
The following strategies can help reduce the overall time it takes to complete your migration:
Training: Your lead architects, DBAs, and developers must understand the differences between Oracle and PostgreSQL and the right way to migrate your applications. A one-week training class helps avoid long delays as they struggle to work around unexpected surprises late in your project.
Testing: Testing your applications is a huge part of the total effort in a migration. Having automated tests that can run at the application or data access layer can help reduce that test-and-repeat cycle time dramatically.
Two-step: If you have an application that requires a lot of SQL modification or code changes for compatibility, reduce the duration you maintain two branches of code by moving as much as possible to ANSI standard code that works on both Oracle and PostgreSQL. Then deploy these changes in your existing production system. This also reduces the changes that must be tested in the final conversion.
Script: Use automated tools such as AWS SCT. Work with a partner that can build automated conversion scripts for your main code cleanups rather than relying on brute force to modify thousands of code files.