AWS Database Blog
Handle empty strings when migrating from Oracle to PostgreSQL
An Oracle-to-PostgreSQL migration in the AWS Cloud can be a multistage process with different technologies and skills involved, starting from the assessment stage to the cutover stage. AWS offers services that make PostgreSQL database deployments easy to set up, manage, and scale for the cloud in a cost-efficient manner. These services are Amazon Relational Database Service (Amazon RDS) for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition.
During the schema or code conversion phase, we can use AWS Schema Conversion Tool (AWS SCT) to convert the majority of the code. For code conversion exceptions raised by AWS SCT, we can refer to migration patterns and the Oracle to PostgreSQL migration playbook.
Converting code from Oracle to a PostgreSQL-compatible engine may involve multiple corner cases that are important for your overall migration success. Handling empty strings (”) with all NULL-compatible operators or expressions while migrating from Oracle to PostgreSQL is vital to achieve overall functional acceptance on the converted code and have a holistic impact.
Oracle databases treat empty strings and NULL as the same. However, in PostgreSQL, empty strings and NULL are different. If you’re running workloads on Oracle, you might implement procedural code with business logic, assuming that empty strings are treated as NULL when performing any comparison or NULL check. In this post, we cover different variants of handling empty strings in PostgreSQL and viable solutions during migration. We show examples to understand the overall components, impacts, and considerations of migration.
NULL vs. empty strings
Oracle databases treat NULL and empty strings as the same; you can use empty strings and NULL interchangeably in any operation. The following conditional statements in Oracle are all evaluated to NULL:
However, in PostgreSQL, NULL means the value is unknown or not given, and an empty string is a string of 0 length. Therefore, NULL and empty strings are interpreted, represented, and operated differently. A PostgreSQL database treats empty strings as an identifiable value that is processed similar to other available characters:
The following table summarizes these differences.
Expression | Oracle Result | PostgreSQL Result |
NULL = NULL | NULL | NULL |
” = NULL | NULL | NULL |
” = ” | NULL: Empty strings are treated as NULL and therefore not comparable to anything, not even to itself | TRUE: Empty strings are comparable because they’re strings of 0 length in PostgreSQL |
Note the following differences between NULL and empty strings in PostgreSQL:
- NULL can be assigned to any type, as opposed to empty strings, which aren’t compatible with date or numerical fields. See the following example code:
- NULL is an unknown value. It doesn’t have a value, as opposed to an empty string, which is a value but an empty one.
- Finding the size or length of NULL results in NULL. An empty string is a string of 0 length.
- NULL = NULL results in NULL, as opposed to ”
=
”, which evaluates to TRUE. - NULL is not comparable to anything, not even to itself, whereas empty strings can be compared.
One of the biggest differences between Oracle and PostgreSQL is how NULL and empty strings operate with regards to composite unique indexes and constraints. PostgreSQL allows multiple rows with NULL in unique indexes. In contrast, Oracle restricts multiple NULL instances and empty strings in a composite unique constraint or unique index. When encountering an empty string, both Oracle and PostgreSQL throw an error for multiple entries.
The following is an example using Oracle :
Because Oracle treats empty strings as NULL, multiple entries of empty strings to unique constraints or unique indexes are violated:
The following is the same example using PostgreSQL:
PostgreSQL databases treat empty strings and NULL as different. All empty strings are equal, and therefore multiple entries of empty strings to a unique constraint or unique index are violated:
Handling empty strings in PostgreSQL
From the preceding discussion, it’s evident that when migrating from Oracle to PostgreSQL, empty string handling is important and crucial.
Let’s create a sample table and insert data to it. You can use the following SQL statements to create a table and insert data in both Oracle and PostgreSQL:
In Oracle, because empty strings are treated as NULL, the preceding insert statements #2 and #3 will store NULL for column tname
in the table. However, in PostgreSQL, the table will store NULL for the #2 and an empty string for the #3 insert statements.
You can see the difference in Oracle and PostgreSQL if rows are selected with the IS NULL or IS NOT NULL operator.
The following is the Oracle code:
The following is the PostgreSQL code:
If you want to select all rows that have NULL or empty strings in PostgreSQL, you must explicitly add the empty string condition in the query’s WHERE clause:
Optionally, you can use NULL handling functions such as COALESCE or NULLIF in PostgreSQL to deal with scenarios containing empty strings.
COALESCE function
The COALESCE function returns the first of its arguments that is not NULL. NULL is returned only if all arguments are NULL. You can adjust the WHERE tname IS NULL
filter condition to WHERE coalesce(tname, '') = ''
to get all rows whose tname
is either empty or not known.
However, changing WHERE clauses in SQL queries to support the functionality might affect the run plan of query. The database might not optimize the plan because the particular filter is changed, and any index created on such column wouldn’t be used. There are chances that your query could be slower. In those cases, you must analyze the query run plan, revisit the indexes created, and create partial indexes if required on the table to tune the query. Query tuning is beyond the scope of this post.
The following code is an example of the COALESCE function:
NULLIF function
NULLIF returns NULL if both the arguments are same. It returns the 1st argument when both the arguments are different. We use it to compare the input expression with the empty string and return NULL:
We transform native PostgreSQL function with NULLIF for cases where we might encounter empty strings (”) as input. For example, if you have a condition like “tname IS NULL”
where “tname”
can contain NULL, then you can change to “nullif(tname, '') IS NULL”
in PostgreSQL.
With all such cases of NULL and empty strings, let’s dive deep into other database expressions or functions that require proper attention for porting Oracle procedural code to PostgreSQL.
NULL handling functions and challenges with empty strings
NULL handling functions in both Oracle and PostgreSQL work the same in any comparison or expression. However, because an empty string is another way of representing NULL in Oracle, no difference is observed when NULL handling functions are operated with NULL or empty strings. But in a PostgreSQL database, NULL handling functions in conjunction with empty strings evaluate to FALSE. This creates a challenge when migrating from Oracle to PostgreSQL to simulate the same behavior for NULL handling functions when dealing with empty strings.
The following table summarizes the overall behavior between Oracle and PostgreSQL regarding empty strings.
Input Expression | Oracle Condition Evaluation | PostgreSQL Condition Evaluation |
” IS NULL | True | False |
” IS NOT NULL | False | True |
NVL(”, ‘A’) | A | Not available in PostgreSQL. You can use the Orafce extension NVL function. You must use NULLIF with NVL function to parse empty string as NULL. |
COALESCE(”, ‘A’) | A | ” (empty string) |
DECODE(”, NULL, ‘A’, ‘B’) | A | Not available in PostgreSQL. You can use the Orafce extension DECODE function. You must use NULLIF with the DECODE function to parse empty strings as NULL. Another approach is to convert DECODE to CASE. |
CASE WHEN ” IS NULL | True | False |
CASE ” WHEN NULL | False | False |
Now that we have a better understanding of the differences between Oracle and PostgreSQL, we can walk through some of the workarounds and best practices to use during the conversion phase.
Handling IS NULL and IS NOT NULL
If you’re using conditional expressions like IS NULL and IS NOT NULL multiple times as part of procedural code in Oracle, then you must validate that they will work in PostgreSQL with respect to empty strings. Use the NULLIF function in PostgreSQL for situations where a column or variable value might become an empty string during an expression evaluation.
During code migration from your Oracle to PostgreSQL database, adding NULLIF is required only for character varying data types. This is not compatible for other data types like NUMERIC or DATE.
Handling NVL and COALESCE
PostgreSQL databases don’t have a built-in NVL function, and instead support the COALESCE function, which is ANSII compliant for NVL. The COALESCE function works perfectly fine for data types other than strings. This is because strings can accept empty strings. For situations that might accept empty strings as input, you must handle them carefully with the PostgreSQL COALESCE function.
The following is the Oracle code:
The following is the PostgreSQL code:
The following example shows how the functionality of COALESCE differs with empty strings as input.
The following is the Oracle code:
The following is the PostgreSQL code:
Although the behavior of COALESCE in PostgreSQL conforms to the standard, it’s not always compatible based on input or variable values while migrating from Oracle. Let’s understand the workaround for COALESCE for empty string handling. The following is the PostgreSQL code:
Orafce NVL function
Orafce is an extension supported in Amazon Relational Database Service (Amazon RDS) for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition. When you create this extension in PostgreSQL, it brings modules containing some useful functions that can help with porting your Oracle application to PostgreSQL. Orafce provides the NVL function, which you can use while migrating code from Oracle without code changes.
The following example shows how Orafce NVL works for NULL input but doesn’t treat empty strings as NULL as the Oracle database does:
Therefore, if you’re using the Orafce NVL function as part of your code migration or development, you must make sure you handle empty strings for character varying or text data types, as illustrated in the following example:
Mock NVL function as wrapper for character varying or text arguments
Adding the NULLIF function to each character varying or text data input in COALESCE or Orafce NVL could make the code migration process cumbersome and time-consuming. To mitigate this risk, a workaround is to create a wrapper function in PostgreSQL that parses empty strings to NULL and at the same time is performant.
The following shows a wrapper code for the NVL function with two arguments for the text type:
Let’s test the NVL wrapper function we created and see if it can handle NULL as well empty strings:
Because the empty string is possibly an input argument for the character or text data type in Oracle, you can create the preceding wrapper function to mimic similar behavior of the Oracle NVL function and expedite the code migration process.
The NVL wrapper function we created is designed for character varying or text data types only, because of empty string behavior in PostgreSQL. For other data types, you can choose between COALESCE or the Orafce NVL function.
Handling DECODE and CASE expressions
Expressions like CASE and DECODE can be part of many dynamic queries or conditional expressions, and you need to make sure it adheres to similar behavior for NULL and ” as input. See the following Oracle code:
You get the following output:
The PostgreSQL engine offers additional functions and extensions to mitigate empty string conditional evaluation and provides the same behavior as the Oracle engine:
You get the following output:
No function like DECODE exists in PostgreSQL. You need to convert to CASE expressions manually. You can also use the Orafce DECODE function.
Both DECODE and CASE expressions treat NULL and empty strings as the same in Oracle and need proper handling when converting them to PostgreSQL.
With that understanding, let’s explore various options and best practices to migrate CASE or DECODE into PostgreSQL without breaking the functionality.
Option 1: Convert DECODE as a CASE expression
You can convert all DECODE functions to traditional CASE expressions in PostgreSQL while migrating code from Oracle. But you must always remember the correct implementation of CASE when empty strings might appear as input to the CASE expression.
The following is the Oracle code:
The following is the right approach in PostgreSQL to write a searched CASE expression:
Option 2: Use the Orafce DECODE function in PostgreSQL
Let’s test the DECODE function provided by the Orafce extension with different types of input parameters and learn how to correctly handle empty strings.
When the input is a non-NULL string, Oracle DECODE and Orafce DECODE provide the same output.
The following is the Oracle code:
The following is the PostgreSQL code:
When the input is NULL, Oracle DECODE and Orafce DECODE provide the same output.
The following is the Oracle code:
The following is the PostgreSQL code:
When the input is an empty string, Oracle DECODE and Orafce DECODE provide different output.
The following is the Oracle code:
The following is the PostgreSQL code:
In this case, you must either convert to CASE, as described earlier, or use NULLIF with DECODE:
Conclusion
Handling empty strings and their evaluation with different operators, expressions, or functions like IS NULL, NVL, CASE, and DECODE should be an important consideration when migrating your database from Oracle to PostgreSQL. In this post, we have discussed the importance of handling empty strings and viable solutions during migration from Oracle to PostgreSQL. Our examples cover different variants of handling empty strings to understand the overall components, impacts, and considerations of migration to PostgreSQL.
If you have any questions or suggestions about this post, leave a comment.
About the Authors
Sashikanta Pattanayak works as a Lead Consultant with the Professional services team at AWS. He works with customers to build scalable, highly available and secure solutions in the AWS cloud. He specializes in homogeneous and heterogeneous database migrations.
Deepak Mahto was a Consultant with the AWS Proserve Team in India. He worked as Database Migration Lead, helping and enabling customers to migrate from commercial engines to Amazon RDS. His passion is automation and has designed and implemented multiple database or migration related tools.
Vinay Paladi is a Database Consultant with the Professional Services team at Amazon Web Services. He works as a database migration specialist, helping and enabling customers to build high-available, cost-effective database solutions and migrate their commercial engines to AWS cloud. He is passionate about building innovative solutions to accelerate database journey to cloud.