Migration tips for developers converting Oracle and SQL Server code to PostgreSQL
PostgreSQL is one of the most popular open-source relational database systems. It is considered to be one of the top database choices when customers migrate from commercial databases such as Oracle and Microsoft SQL Server. AWS provides two managed PostgreSQL options: Amazon RDS and Amazon Aurora.
In addition to providing managed PostgreSQL services, AWS also provides tools and resources to help with migration. AWS Schema Conversion Tool (SCT) is a free AWS tool that helps you convert your existing schemas and supports several source and target databases. AWS also provides AWS Database Migration Service (DMS), which helps transfer and continuously replicate data between heterogeneous and homogenous databases. Similarly, AWS provides migration playbooks that document a large number of functional mappings between commercial databases and open-source databases such as PostgreSQL.
This post provides tips and best practices for converting code from PL/SQL to PL/pgSQL, which can help achieve better performance and code conversions to PostgreSQL. This post is targeted for developers working on database migrations and assumes that the readers have a basic knowledge of databases and PL/SQL.
This section provides some of the factors that influence PostgreSQL performance improvements while migrating from commercial or legacy databases such as SQL Server or Oracle. Most of the databases have similar objects, but considering the right object, changes the behavior of the system. This section explains how to achieve better performance with stored procedures, functions, and SQL statements.
To avoid re-work, correctly map the data types in the target database to the source system before starting the project. The following table summarizes some common data type mapping from Oracle and SQL Server to PostgreSQL.
|Number||Small Integer||Tinyint / Smallint||Generally for lookup tables whose values of the table are limited.|
|Number||Integer / Bigint||Integer / Bigint|
Double Precision /
Float / Numeric
Double Precision /
Float / Numeric
|For the financial domain in which you want an application to store high precision value, you can configure it as numeric/decimal. Otherwise, double precision or float may be sufficient.|
|Timestamp(6)||Timestamp without timezone||
|Bytea||Binary, Image, VarBinary|
Why number to smallint/integer/bigint and not numeric?
To get the best performance from your database, it is important to use optimal data types.
If your table column must hold a maximum of a four-digit number, the column data type with 2 (smallint) bytes is sufficient, rather than defining 4 (integer/real), 8 (bigint/double precision), or variable (numeric) byte data types.
Numeric is a complex type that can hold 131,000 digits and is recommended for storing monetary amounts and other quantities for which exactness is required. However, calculations on numeric values are very slow compared to the integer types or floating-point types, because its operators are slow.
The following table gives an example of how the size of a table grows for a single column when you compare numeric size with smallint/int/bigint for non-precision columns, excluding indexes.
|TN||Size||External size||Value inserted|
|numericsize||16 KB||8192 bytes||Insert into numericsize value (1234678)|
|smallintsize||8192 bytes||0 bytes||Insert into numericsize value (1234)|
|intsize||8192 bytes||0 bytes||Insert into numericsize value (123457)|
|bigintsize||8192 bytes||0 bytes||Insert into numericsize value (123486)|
The following table uses the same information as the previous table, but includes indexes. For this table, size refers to the total size of the table, and external size is the size of related objects such as indexes.
|numericsize||32 KB||24 KB|
|smallintsize||24 KB||16 KB|
|intsize||24 KB||16 KB|
|bigintsize||24 KB||16 KB|
AWS SCT maps number to numeric data type for tables without knowing the actual data size. This tools have an option to configure/map to right data type while conversion.
Procedures and functions
PostgreSQL 10 and older versions do not have procedures support. All the procedures and functions from Oracle and SQL Server are mapped to functions in PostgreSQL. The procedures are supported in PostgreSQL, starting from version 11, and are similar to Oracle.
PostgreSQL supports three volatility categories of functions and you must use the appropriate type based on the functionality while migrating:
Immutable. Marking the function type appropriately could be an important performance tweak.
Volatile type indicates that the function value can change even within a single table scan, so that no optimizations can be made. Relatively few database functions are volatile; some examples are
timeofday(). Any function that has side effects must be classified as volatile, even if its result is predictable, to prevent calls from being optimized away, one example is
setval(). If the volatility type is not provided during function creation, all new functions are marked as volatile by default.
Below is a sample function to show the time taken to execute the
Stable type indicates that the function cannot modify the database. It also indicates that within a single table scan, it consistently returns the same result for the same argument values, but that its result could change across SQL statements. This is the appropriate selection for functions whose results depend on database lookups or parameter variables, such as the current time zone. The
current_timestamp family of functions qualifies as stable, because their values do not change within a transaction.
Below is a sample function to show the time taken to execute the
Execute the function below to see the cost of the function.
Immutable type indicates that the function cannot modify the database and always returns the same result when given the same argument values. This means it does not do database lookups or otherwise use information not directly present in its argument list. If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value.
Below is the sample function to show the time taken to execute the
Execute the function below to see the cost of the function.
All of these functions return the following value:
Though all of the above functions deliver the same value, you may need to use any of these three function types, depending on the functionality, to achieve better performance.
The test run of each of these functions shows that the functions contain the same functionality, but the
Immutable variant takes the minimum amount of time. This is because this category allows the optimizer to pre-evaluate the function during the query calls with constant arguments.
Function calls in views and queries
Many applications use views and queries that contain function calls. As discussed in the previous section, in PostgreSQL, this can be a costly operation, especially if the function volatility category is not set correctly. In addition to this, the function call itself adds to the query cost.
Choose the appropriate volatility for your function based on what the function does. If your function truly is
Stable, setting it instead of using the default of
Volatile could give you some performance advantages.
The following example code is a query with the
Volatile function call.
getDeptname() is marked as volatile. The total runtime for the query is 2 seconds and 886 milliseconds.
The following example code is a query with the
Stable function call.
The following example code replaces the function call with functionality.
PostgreSQL provides the functionality to trap and raise errors using the
Raise statements. This is a useful functionality, but it comes at a cost.
Raise statements raise errors and exceptions during a PL/pgSQL function’s operation. By default, any error occurrence inside a PL/pgSQL function causes the function to abort the execution and roll back the changes. To recover from errors, PL/pgSQL can trap the errors using the
Exception clause. For this functionality, PostgreSQL has to save the state of the transaction before entering the block of code with exception handling. This is an expensive operation, so it adds an overhead cost.
To avoid this overhead, it is recommended to either have the exceptions catching at the application side, or make sure that the required validation is in place so that the function never causes an exception.
The following code example demonstrates the performance impact of having an exception in a function call.
If you can’t verify without an exception, the exception is clearly required. In the preceding example, you can check the diagnostics to see if there is a change that was taken care of. It is a good practice to avoid using exception handling if possible.
Counter not required for fetch operation
Many applications get the total count and loop through the cursor to fetch the records. Because the fetch operation returns
null when there are no records, it is better to use the fetch status rather than looping through the count by declaring another two variables and checking the count. You can avoid declaring extra variables and checking incremental values by reducing the number of statements to execute and achieve better performance. See the following code as an example.
Check with EXISTS rather than count
In legacy applications, SQL queries are written to find the count of records that match, and then applies the required business logic. If table has billions of records, then getting the record count can be costly.
The code sample below demonstrates how to check the count of rows and then update the data.
This code can also be re-written to check for one column rather an entire row, which is can be more cost and performance effective. See the sample code below.
The total runtime of the query is 104 milliseconds.
Record count after DML statements
In most of the legacy applications, the record count indicates if there are any changes for the data manipulation statements. In PostgreSQL, this information is maintained in its statistics and can be retrieved to avoid the count of the values after the operation. Use diagnostics to retrieve the number of rows affected, as shown in the code sample below.
Pattern match and search
It’s common practice to use the wildcard character
_ with the
ILIKE for insensitive searches) expression while retrieving data from tables. If the wildcard character is at the start of the given pattern, the query planner can’t use an index even if an index exists. In this case, a sequential scan is used, which is a time-consuming operation. To get better performance with millions of records and make the query planner use the available indexes, use the wildcard character either in the middle or at the end rather than in the beginning of the predicate. This forces the planner to use indexes.
In addition to the
LIKE expression, you can also use the
pg_trgm module/extension for pattern matching. The
pg_trgm module provides functions and operators that you can use to determine the similarity of alphanumeric text. It also provides index operator classes that support fast searching for similar strings. For more information, see the pg_trgm documentation on the PostgreSQL website.
Conversion mapping between Oracle, SQL Server, and PostgreSQL
This section provides database specific comparisons while writing SQL statements across Oracle, SQL Server, and PostgreSQL databases.
Default FROM clause
For Oracle, the
FROM clause is mandatory, in which case you would use the code
Select 1 from Dual;. For PostgreSQL and SQL, it is optional to use the code
Generating a series of values
You can generate a series of values from the start to the end number.
In Oracle, you don’t need a starting number, but can give an end number. See the following code as an example.
In SQL Server, use the following code.
Join with (+) operator
In Oracle, for a left join, use the following code.
For a right join, use the following code.
For more information, see SQL for Beginners (Part 5): Joins on the Oracle database site.
There is no feature called “+” in PostgreSQL or SQL Server to do a left or right join to the tables. Instead, use the following two queries.
Type as a parameter to functions
In SQL Server, you can pass multiple records with the
Type data type. To implement the same in PostgreSQL, you can use it as a JSON or text data type in JSON format or array. The following example code is with text data type in JSON format with multiple records. You can insert it into a temporary table and process it further with the following code.
The following code shows how multiple records can be passed in the varchar data type in Oracle.
The following code shows how multiple records can be passed in table type in SQL Server for the same functionality given above in Oracle.
The following code shows how multiple records can be passed in as text type in PostgreSQL for the same functionality given above in Oracle and SQL Server.
In PostgreSQL, the pivoting functionality is not enabled and requires an extension. The extension
tablefunc enables the
crosstab function, which you use creating pivot tables, similar to SQL Server and Oracle. The following is the pivoting functionality code in Oracle, SQL Server, and PostgreSQL.
Implement the pivoting functionality in Oracle with the following code.
Implement the pivoting functionality in SQL Server with the following code.
Create the extension for PostgreSQL with the following code.
Unpivoting to an array
There is no
Unpivot function available in PostgreSQL. When converting from SQL Server or Oracle to PostgreSQL, the unpivot is mapped to an array. See the following code for an example.
Implement the unpivoting functionality in Oracle with the following sample code.
Implement the unpivoting functionality in SQL Server with the following sample code.
Implement the unpivoting functionality in PostgreSQL with the following sample code.
Returning multiple result sets from a function
It is straightforward for SQL Server to return multiple result sets with multiple rows. You can accomplish the same in PostgreSQL and Oracle with cursors as given samples below.
Return multiple result sets from a procedure in Oracle with the following code.
Return multiple result sets from a procedure in SQL Server with the following code. No extra parameters are required in SQL Server.
To execute the procedure in SQL Server, enter the following code.
Return multiple result sets from a procedure in PostgreSQL with the following code.
To execute the procedure in PostgreSQL, enter the following code.
Inline queries with alias
PostgreSQL semantics may refer to inline views as
Subquery. Oracle supports omitting aliases for the inner statement. In PostgreSQL and SQL Server, the use of aliases is mandatory. The following code examples use B as an alias.
The following code is a sample inline query in Oracle.
SQL Server and PostgreSQL
The same sample inline queries written in Oracle requires an alias name in SQL Server and PostgreSQL.
After migrating data from either Oracle or SQL Server to PostgreSQL, the retrieval order of the data may vary. The reason could be either the order of insertion or the data type of the column and its values or collation.
To get the correct order of the data, identify the business need and apply the
Order by clause on the query to match the data.
dblink and foreign data wrappers
dblink is the functionality used to communicate across homogeneous and heterogeneous databases. As of this post, Amazon RDS and Aurora PostgreSQL don’t offer heterogeneous support, but they do have support to communicate across the PostgreSQL databases.
Communicating across homogeneous databases
PostgreSQL support cross database communication with dblink and foreign data wrappers (FDWs) for cross-database communication. This section discusses how to use dblink and FDW.
Using foreign data wrappers
PostgreSQL supports FDWs, which you can use to access data stored in external sources. Amazon RDS and Aurora PostgreSQL support only PostgreSQL FDW currently. To configure the PostgreSQL FDW, complete the following steps.
- Create the extension with the following code.
- Create the server and link to external databases with the following code.
- Create the user mapping to access the tables from an external database with the following code.
Create user mapping for every user who would like to communicate via FDW.
- Import all the external tables into local schema to have access to the data from external tables just like regular tables accessed. Here is the sample code to import the tables from external database and schema.
Communicating across heterogeneous databases
PostgreSQL doesn’t support cross-database communication. To have heterogeneous cross-database communication, Amazon Aurora PostgreSQL has limitations, but you can implement
dblink on the source environment (for example, Oracle or SQL Server) to the target (PostgreSQL), and can either pull or push the data.
For more information, see Cross-Database Querying in Compose PostgreSQL.
Creating a view for a foreign database table with dblink
dblink is a PostgreSQL
contrib extension that allows you to perform short ad hoc queries in other databases. With the dblink option, the user must provide and store the password in clear text, which is visible to users. This option is not recommended unless you have no other choice.
Option 1: Provide target database access details in the SQL statement itself
In this option, the host connection and database credentials must be provided every time multiple places must change, such as any changes in the host or connection details.
Option 2: Separate out access details and use connection object
In this option, host and connection details are defined at one place and use the connection name to have the cross-database connections.
Function call with dblink
The following code is a function from a foreign PostgreSQL database that returns an integer.
The following code is a function from a foreign PostgreSQL database that returns a table type.
Finding the maximum and minimum value of a set of numbers
You may need maximum and minimum values when migrating to PostgreSQL. PostgreSQL includes a function to find these values, as demonstrated with the following code.
Considering self-join for updates
Updates work differently in PostgreSQL compared to SQL Server if you are using the same source table (the table that is getting updated) in the
from clause of select statement. In PostgreSQL, the second reference in the
from clause is independent of first reference, unlike SQL Server, and the changes are applied to the entire table.
The following code example updates salaries for employees from Department 1.
This function works the same in SQL Server, but when you migrate, the same SQL statement updates the entire table rather than a single department. PostgreSQL works differently, it assumes that the two
employee tables are independent from each other, unlike SQL Server.
To update a single department, convert the DML to the following code.
If using Oracle, convert the DML to the following code.
This post shared some tips and best practices for developers working on migrations from commercial databases to PostgreSQL. This post highlights many decisions you must make during the migration and how they can impact your database performance. Keeping these performance aspects in mind during the conversion can help avoid performance issues later on during migration.
If you have any questions or comments about this post, please share your thoughts in the comment section.
About the Author
Viswanatha Shastry Medipalli is a Consultant with the AWS ProServe team in India. His background spans a wide depth and breadth of expertise and experience in SQL database migrations. He has architected and designed many successful database solutions addressing challenging business requirements. He has provided solutions using Oracle, SQL Server and PostgreSQL for reporting, business intelligence, applications, and development support. He also has a good knowledge of automation, and orchestration. His focus area is homogeneous and heterogeneous migrations of on-premise databases to Amazon RDS and Aurora PostgreSQL.