AWS Database Blog

Migration tips for developers converting Oracle and SQL Server code to PostgreSQL

PostgreSQL is one of the most popular open-source relational database systems. It is considered to be one of the top database choices when customers migrate from commercial databases such as Oracle and Microsoft SQL Server. AWS provides two managed PostgreSQL options: Amazon RDS and Amazon Aurora.

In addition to providing managed PostgreSQL services, AWS also provides tools and resources to help with migration. AWS Schema Conversion Tool (SCT) is a free AWS tool that helps you convert your existing schemas and supports several source and target databases. AWS also provides AWS Database Migration Service (DMS), which helps transfer and continuously replicate data between heterogeneous and homogenous databases. Similarly, AWS provides migration playbooks that document a large number of functional mappings between commercial databases and open-source databases such as PostgreSQL.

This post provides tips and best practices for converting code from PL/SQL to PL/pgSQL, which can help achieve better performance and code conversions to PostgreSQL. This post is targeted for developers working on database migrations and assumes that the readers have a basic knowledge of databases and PL/SQL.

Performance considerations

This section provides some of the factors that influence PostgreSQL performance improvements while migrating from commercial or legacy databases such as SQL Server or Oracle. Most of the databases have similar objects, but considering the right object, changes the behavior of the system. This section explains how to achieve better performance with stored procedures, functions, and SQL statements.

Data types

To avoid re-work, correctly map the data types in the target database to the source system before starting the project. The following table summarizes some common data type mapping from Oracle and SQL Server to PostgreSQL.

Oracle PostgreSQL SQL Server Notes
Number Small Integer Tinyint / Smallint Generally for lookup tables whose values of the table are limited.
Number Integer / Bigint Integer / Bigint
Number

Double Precision /

Float / Numeric

Double Precision /

Float / Numeric

For the financial domain in which you want an application to store high precision value, you can configure it as numeric/decimal. Otherwise, double precision or float may be sufficient.
Varchar

Char(n)

Varchar(n)

Varchar

Text

Character varying

Nchar

Nvarchar

Ntext

Timestamp(6) Timestamp without timezone

DateTime2(p)

DateTime

Clob Text
Blob Raw
Bytea Binary, Image, VarBinary
Boolean Boolean Bit
XML XML XML

Why number to smallint/integer/bigint and not numeric?

To get the best performance from your database, it is important to use optimal data types.

If your table column must hold a maximum of a four-digit number, the column data type with 2 (smallint) bytes is sufficient, rather than defining 4 (integer/real), 8 (bigint/double precision), or variable (numeric) byte data types.

Numeric is a complex type that can hold 131,000 digits and is recommended for storing monetary amounts and other quantities for which exactness is required. However, calculations on numeric values are very slow compared to the integer types or floating-point types, because its operators are slow.

The following table gives an example of how the size of a table grows for a single column when you compare numeric size with smallint/int/bigint for non-precision columns, excluding indexes.

TN Size External size Value inserted
numericsize 16 KB 8192 bytes Insert into numericsize value (1234678)
smallintsize 8192 bytes 0 bytes Insert into numericsize value (1234)
intsize 8192 bytes 0 bytes Insert into numericsize value (123457)
bigintsize 8192 bytes 0 bytes Insert into numericsize value (123486)

The following table uses the same information as the previous table, but includes indexes. For this table, size refers to the total size of the table, and external size is the size of related objects such as indexes.

TN Size External size
numericsize 32 KB 24 KB
smallintsize 24 KB 16 KB
intsize 24 KB 16 KB
bigintsize 24 KB 16 KB

AWS SCT maps number to numeric data type for tables without knowing the actual data size. This tools have an option to configure/map to right data type while conversion.

Procedures and functions

PostgreSQL 10 and older versions do not have procedures support. All the procedures and functions from Oracle and SQL Server are mapped to functions in PostgreSQL. The procedures are supported in PostgreSQL, starting from version 11, and are similar to Oracle.

PostgreSQL supports three volatility categories of functions and you must use the appropriate type based on the functionality while migrating: Volatile, Stable, and Immutable. Marking the function type appropriately could be an important performance tweak.

Volatile

The Volatile type indicates that the function value can change even within a single table scan, so that no optimizations can be made. Relatively few database functions are volatile; some examples are random(), currval(), and timeofday(). Any function that has side effects must be classified as volatile, even if its result is predictable, to prevent calls from being optimized away, one example is setval(). If the volatility type is not provided during function creation, all new functions are marked as volatile by default.

Below is a sample function to show the time taken to execute the Volatile function.

Create Or Replace Function add_ten_v(num int) Returns integer AS $$
Begin
  Perform pg_sleep(0.01);
  Return num + 10;
End
$$ Language 'plpgsql' Volatile;

Execute the function below to see the cost of the function.

lab=>Explain Analyze Select add_ten_v(10)FROM generate_series(1,100,1); 
Query plan
-----------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..260.00 rows=1000 width=4) (actual time=10.200..1015.461 rows=100 loops=1)
 Planning time: 0.030 ms
 Execution time: 1015.501 ms
(3 rows)
Time: 1016.313 ms

Stable

The Stable type indicates that the function cannot modify the database. It also indicates that within a single table scan, it consistently returns the same result for the same argument values, but that its result could change across SQL statements. This is the appropriate selection for functions whose results depend on database lookups or parameter variables, such as the current time zone. The current_timestamp family of functions qualifies as stable, because their values do not change within a transaction.

Below is a sample function to show the time taken to execute the Stable function.

Create Or Replace Function add_ten_s(num int) Returns integer AS $$
Begin
  Perform pg_sleep(0.01);
  Return num + 10;
End
$$ Language 'plpgsql' Stable;

Execute the function below to see the cost of the function.

lab=> Explain Analyze Select add_ten_s(10) From generate_series(1,100,1);
                                                       Query Plan
-------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..260.00 rows=1000 width=4) (actual time=10.153..1013.814 rows=100 loops=1)
 Planning time: 0.031 ms
 Execution time: 1013.846 ms
(3 rows)
Time: 1014.507 ms

Immutable

The Immutable type indicates that the function cannot modify the database and always returns the same result when given the same argument values. This means it does not do database lookups or otherwise use information not directly present in its argument list. If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value.

Below is the sample function to show the time taken to execute the Immutable function.

Create Or Replace Function add_ten_i(num int) Returns integer AS $$
Begin
  Perform pg_sleep(0.01);
  Return num + 10;
End
$$ Language 'plpgsql' Immutable;

Execute the function below to see the cost of the function.

lab=> Explain Analyze Select Add_Ten_I(10) From Generate_Series(1,100,1);
                                                     Query Plan
--------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..10.00 rows=1000 width=4) (actual time=0.009..0.016 rows=100 loops=1)
 Planning time: 10.185 ms
 Execution time: 0.030 ms
(3 rows)
Time: 10.681 ms

All of these functions return the following value:

lab=> Select Add_Ten_V(10), Add_Ten_S(10), Add_Ten_I(10);
 add_ten_v | add_ten_s | add_ten_i
-----------+-----------+-----------
        20 |        20 |        20
(1 row)

Though all of the above functions deliver the same value, you may need to use any of these three function types, depending on the functionality, to achieve better performance.

The test run of each of these functions shows that the functions contain the same functionality, but the Immutable variant takes the minimum amount of time. This is because this category allows the optimizer to pre-evaluate the function during the query calls with constant arguments.

Function calls in views and queries

Many applications use views and queries that contain function calls. As discussed in the previous section, in PostgreSQL, this can be a costly operation, especially if the function volatility category is not set correctly. In addition to this, the function call itself adds to the query cost.

Choose the appropriate volatility for your function based on what the function does. If your function truly is Immutable or Stable, setting it instead of using the default of Volatile could give you some performance advantages.

The following example code is a query with the Volatile function call.

Explain Analyze
Select Empid,
		Empname,
		Getdeptname(Deptid),
		Salary,
		Doj,
		Address
From Emp
Where Deptid=2

The function getDeptname() is marked as volatile. The total runtime for the query is 2 seconds and 886 milliseconds.

The following example code is a query with the Stable function call.

Explain Analyze 
Select Empid,
		Empname,
		Getdeptnames(Deptid),
		Salary,
		Doj,
		Address
From Emp
Where Deptid=2

The function getDeptname() is marked as stable. The total runtime for the query is 2 seconds and 644 milliseconds.

The following example code replaces the function call with functionality.

Explain Analyze 
Select Empid,
		Empname,
		Deptname,
		Salary,
		Doj,
		Address
From Emp E
Join Dept D On D.Deptid = E.Deptid
Where E.Deptid=2

The function logic is moved to the query successfully. The total runtime for the query is 933 milliseconds.

Optimizing exceptions

PostgreSQL provides the functionality to trap and raise errors using the Exception and Raise statements. This is a useful functionality, but it comes at a cost.

Raise statements raise errors and exceptions during a PL/pgSQL function’s operation. By default, any error occurrence inside a PL/pgSQL function causes the function to abort the execution and roll back the changes. To recover from errors, PL/pgSQL can trap the errors using the Exception clause. For this functionality, PostgreSQL has to save the state of the transaction before entering the block of code with exception handling. This is an expensive operation, so it adds an overhead cost.

To avoid this overhead, it is recommended to either have the exceptions catching at the application side, or make sure that the required validation is in place so that the function never causes an exception.

The following code example demonstrates the performance impact of having an exception in a function call.

Create Or Replace Function empsal (eid int)
Returns Integer AS $total$
Declare
	Total Integer;
Begin
   Update Emp Set Salary = Salary * 0.20 Where Empid = Eid;
   Return 1;
End;
$$ Total Language Plpgsql;


Create Or Replace Function Empsalexcep (Eid Int)
Returns Integer AS $Total$
Declare
	Total Integer;
Begin
   Update Emp Set Salary = Salary * 0.20 Where Empid = Eid;
   RETURN 1;
   Exception
   	When Others Then
		Raise Notice 'Salary Update Failed ';
END;
$$ Total Language Plpgsql;


Select * From Empsal(3) 		– 	78ms	-- without exception handling
Select * From Empsalexcep(3)		-	84ms	-- with exception handling

If you can’t verify without an exception, the exception is clearly required. In the preceding example, you can check the diagnostics to see if there is a change that was taken care of. It is a good practice to avoid using exception handling if possible.

Counter not required for fetch operation

Many applications get the total count and loop through the cursor to fetch the records. Because the fetch operation returns null when there are no records, it is better to use the fetch status rather than looping through the count by declaring another two variables and checking the count. You can avoid declaring extra variables and checking incremental values by reducing the number of statements to execute and achieve better performance. See the following code as an example.

Select Count(1) Into Count_Value
    From Tab1
    Where Tab1.A = Value 
Counter = 0
Open Dvscriptcursor For Select Id From Tab1;
While (Counter <     Count_Value)
Loop	
	Fetch Id Into Var_Id

	……..
…….
Counter = Counter +1;
End Loop

You can also rewrite this code as I’ve done below. This helps avoid declaring two variables and uses cursor itself to iterate and cursor status to break/exit the loop.

OPEN Dvscriptcursor For Select Id From Tab1;
Loop	
	Fetch Id Into Var_Id
	Exit When Not Found
	……..
…….
…….
End Loop

Check with EXISTS rather than count

In legacy applications, SQL queries are written to find the count of records that match, and then applies the required business logic. If table has billions of records, then getting the record count can be costly.

The code sample below demonstrates how to check the count of rows and then update the data.

Create Or Replace Function Empsal (Eid Int)
Returns Integer As $Total$
Declare
	Total Integer;
Begin
  If (Select Count(*) From Emp Where Empid = Eid) > 0 Then  -- Wrong Usage
   		Update Emp Set Salary = Salary * 0.20 Where Empid = Eid  ;
	End If;
   Return 1;
End;
$Total$ Language plpgsql;

The total runtime of the query is 163 milliseconds.

This code can also be re-written to check for one column rather an entire row, which is can be more cost and performance effective. See the sample code below.

Create Or Replace Function Empsal (Eid Int)
Returns Integer AS $Total$
Declare
	Total Integer;
Begin
	If Exists (Select 1  From Emp Where Empid = Eid) Then.   – Right Usage
   		Update Emp Set Salary = Salary * 0.20 Where Empid = Eid  ;
	End If;
   RETURN 1;
END;
$$ Total Language plpgsql;

The total runtime of the query is 104 milliseconds.

Record count after DML statements

In most of the legacy applications, the record count indicates if there are any changes for the data manipulation statements. In PostgreSQL, this information is maintained in its statistics and can be retrieved to avoid the count of the values after the operation. Use diagnostics to retrieve the number of rows affected, as shown in the code sample below.

Create Or Replace Function Empsal (Eid Int)
Returns Integer AS $Total$
Declare
	Total Integer;
	Rows_Affected Int;
Begin
	If Exists (Select 1 From Emp Where Empid = Eid) Then 
   	   Update Emp Set Salary = Salary * 0.20 Where Empid = Eid  ;
	   Get Diagnostics Rows_Affected = ROW_COUNT;
	End If;
         RETURN 1;
END;
$$ Total Language plpgsql;

Pattern match and search

It’s common practice to use the wildcard character % or _ with the LIKE (or ILIKE for insensitive searches) expression while retrieving data from tables. If the wildcard character is at the start of the given pattern, the query planner can’t use an index even if an index exists. In this case, a sequential scan is used, which is a time-consuming operation. To get better performance with millions of records and make the query planner use the available indexes, use the wildcard character either in the middle or at the end rather than in the beginning of the predicate. This forces the planner to use indexes.

In addition to the LIKE expression, you can also use the pg_trgm module/extension for pattern matching. The pg_trgm module provides functions and operators that you can use to determine the similarity of alphanumeric text. It also provides index operator classes that support fast searching for similar strings. For more information, see the pg_trgm documentation on the PostgreSQL website.

Conversion mapping between Oracle, SQL Server, and PostgreSQL

This section provides database specific comparisons while writing SQL statements across Oracle, SQL Server, and PostgreSQL databases.

Default FROM clause

For Oracle, the FROM clause is mandatory, in which case you would use the code Select 1 from Dual;. For PostgreSQL and SQL, it is optional to use the code Select 1;.

Generating a series of values

You can generate a series of values from the start to the end number.

In Oracle, you don’t need a starting number, but can give an end number. See the following code as an example.

Select Rownum As Rownum_Value From Dual Connect By Level <= 64

When using a start and end number, use the following code.

With t(n) As (
  Select 1 from dual
  Union All
    Select n+1 From t Where n < 64
)
Select * From t;

In PostgreSQL, use the following code.

Select Generate_Series(1,64) AS Rownum_Value

In SQL Server, use the following code.

;With n(n) As
(
    Select 1
    Union All
    Select n+1 From n Where n < 64
)
Select n From n Order By n

Join with (+) operator

In Oracle, for a left join, use the following code.

Select b.id, b.title,  b.author, b.year_published, l.name language
From  books b, ibrary.languages l
Where  l.id (+)= b.language_id
Order By  b.id 

For a right join, use the following code.

Select b.id, b.title,  b.author, b.year_published, l.name language
From  books b, ibrary.languages l
Where  l.id = b.language_id (+)
Order BY  b.id

For more information, see SQL for Beginners (Part 5): Joins on the Oracle database site.

There is no feature called “+” in PostgreSQL or SQL Server to do a left or right join to the tables. Instead, use the following two queries.

Select b.id, b.title,  b.author, b.year_published, l.name language
From  books b, 
Left join ibrary.languages l 
On   l.id = b.language_id
Order BY  b.id 

Select b.id, b.title,  b.author, b.year_published, l.name language
From  books b, 
Right join ibrary.languages l
On   l.id = b.language_id 
Order By  b.id

Type as a parameter to functions

In SQL Server, you can pass multiple records with the Type data type. To implement the same in PostgreSQL, you can use it as a JSON or text data type in JSON format or array. The following example code is with text data type in JSON format with multiple records. You can insert it into a temporary table and process it further with the following code.

Create Table emptable1
(
    empid integer,
    last_name varchar(100),
    first_name varchar(100),
    deptid integer,
    salary double precision
)

Oracle

The following code shows how multiple records can be passed in the varchar data type in Oracle.

DECLARE 
 StructType	Varchar2(1000) Default '[{"empid" : 1, "last_name":"AccName1", "first_name":"AccName1", "deptid":"1", "salary":"1234.578"}
 					,{"empid" : "2", "last_name":"AccName2", "first_name":"AccName2", "deptid":"2", "salary":"4567.578"}
                    ]';
Begin
Insert Into emptable1 (empid,last_name,first_name,deptid,salary)
With Json As  
( Select StructType  --'[{"firstName": "Tobias", "lastName":"Jellema"},{"firstName": "Anna", "lastName":"Vink"} ]' doc  
  from   dual  
)  
Select empid,last_name,first_name,deptid,salary
From  json_table( (Select StructType from json) , '$[*]'  
                Columns ( empid PATH '$.empid'
                        ,last_name Path '$.last_name'  
                        , first_name Path '$.first_name'  
                        ,deptid Path '$.deptid'
                        ,salary Path '$.salary'
                        )  
               );
               End;

SQL Server

The following code shows how multiple records can be passed in table type in SQL Server for the same functionality given above in Oracle.

--Create Type structure

Create Type empTableType as Table
(
   empid integer,
    last_name varchar(100),
    first_name varchar(100),
    deptid integer,
    salary double precision
);

--Create Procedure

Create Procedure InsertEmpTable
@InsertEmpt_TVP empTableType READONLY
As
Insert Into emptable1(empid,last_name,first_name,deptid,salary)
Select * FROM @InsertEmpt_TVP;
Go

--Calling the SP with dynamic block and type

Declare @EmpTVP AS empTableType;
Insert Into @EmpTVP(empid,last_name,first_name,deptid,salary)
Values (1,'FirstName','Last_name',1,1234.566),
(2,'FirstName','Last_name',1,1234.566),
(3,'FirstName','Last_name',1,1234.566),
(4,'FirstName','Last_name',1,1234.566),
(5,'FirstName','Last_name',1,1234.566);
Exec InsertEmpTable @EmpTVP;
Go

PostgreSQL

The following code shows how multiple records can be passed in as text type in PostgreSQL for the same functionality given above in Oracle and SQL Server.

Do $$
Declare 
 StructType	Text Default '[{"empid" : "1", "last_name":"AccName1", "first_name":"AccName1", "deptid":"1", "salary":"1234.578"},
 					{"empid" : "2", "last_name":"AccName2", "first_name":"AccName2", "deptid":"2", "salary":"4567.578"}]';

Begin
Insert Into emptable
Select * From json_to_recordset(StructType::json) 
	as x("empid" Int, "last_name" Varchar, "first_name" Varchar, "deptid" Int, "salary" Double Precision);
End $$

Converting pivoting

In PostgreSQL, the pivoting functionality is not enabled and requires an extension. The extension tablefunc enables the crosstab function, which you use creating pivot tables, similar to SQL Server and Oracle. The following is the pivoting functionality code in Oracle, SQL Server, and PostgreSQL.

Create Table crosstabFunc (
  id            Number,
  customer_id   Number,
  product_code  Varchar2(5),
  quantity      Number
);

Insert Into crosstabFunc values (1, 1, 'A', 10);
Insert Into crosstabFunc Values (2, 1, 'B', 20);
Insert Into crosstabFunc Values (3, 1, 'C', 30);
Insert Into crosstabFunc Values (4, 2, 'A', 40);
Insert Into crosstabFunc Values (5, 2, 'C', 50);
Insert Into crosstabFunc Values (6, 3, 'A', 60);
Insert Into crosstabFunc Values (7, 3, 'B', 70);
Insert Into crosstabFunc Values (8, 3, 'C', 80);
Insert Into crosstabFunc Values (9, 3, 'D', 90);
Insert Into crosstabFunc Values (10, 4, 'A', 100);

Oracle

Implement the pivoting functionality in Oracle with the following code.

Select *
From   (Select customer_id, product_code, quantity
        From   crosstabFunc)
Pivot  (Sum(quantity) As sum_quantity For (product_code) In ('A' AS a, 'B' AS b, 'C' AS c))
Order By customer_id;

SQL Server

Implement the pivoting functionality in SQL Server with the following code.

Select * From   
(Select customer_id, product_code, quantity
        From   crosstabFunc) as cf
Pivot  (Sum(quantity) For product_code In (A,B,C))
as cf1
Order By customer_id

PostgreSQL

Create the extension for PostgreSQL with the following code.

Create Extension tablefunc;

Select * From  Crosstab
(' Select customer_id, product_code, quantity
   From   crosstabFunc' )
   as T ( customer_id Int, "A" Int, "B" Int, "C" Int)

Unpivoting to an array

There is no Unpivot function available in PostgreSQL. When converting from SQL Server or Oracle to PostgreSQL, the unpivot is mapped to an array. See the following code for an example.

Create Table Students
(
	Id Int Primary Key Identity,
	Student_Name Varchar (50),
	Math_marks Int,
	English_marks Int,
	History_marks Int,
	Science_marks Int
)
Go
 
Insert Into Students Values ('Sally', 87, 56, 78, 91 )
Insert Into Students Values ('Edward', 69, 80, 92, 98)

Oracle

Implement the unpivoting functionality in Oracle with the following sample code.

Select StudentName, course,score
From   Students
Unpivot (score For course In (Math_marks AS 'Maths', English_marks AS 'English', History_marks AS 'History', Science_marks As 'Science'));

SQL Server

Implement the unpivoting functionality in SQL Server with the following sample code.

Select Student_Name, Course, Score
From Students
Unpivot
(
	Score
	For Course in (Math_marks, English_marks, History_marks, Science_marks)
) AS SchoolUnpivot

PostgreSQL

Implement the unpivoting functionality in PostgreSQL with the following sample code.

Select Student_Name, course, score From
 (
 Select   
  Student_Name, 
  Unnest (Array[ 'Math', 'English','History', 'Science']
    ) As course,
         
  Unnest (Array[ Math_marks, English_marks,History_marks,Science_marks] 
    ) As score
 From StudentsP  
 ) AS Unpvt

Returning multiple result sets from a function

It is straightforward for SQL Server to return multiple result sets with multiple rows. You can accomplish the same in PostgreSQL and Oracle with cursors as given samples below.

Oracle

Return multiple result sets from a procedure in Oracle with the following code.

Create Procedure Spgetdept23
(P_Cur Out Sys_Refcursor, P_Cur12 Out Sys_Refcursor)
Is
Begin
Open P_Cur For
Select * From employees;


Open P_Cur12 For
Select * From dept;

End;


  var cur Refcursor
  var cur2 Refcursor
Exec Spgetdept23(:cur,:cur2);
Print cur;
Print cur2;

SQL Server

Return multiple result sets from a procedure in SQL Server with the following code. No extra parameters are required in SQL Server.

Create  Procedure Dbo.Multiple_Reseultset
As

Begin 
  
	Select * From HumanResources.Employee
	
	Select * From HumanResources.Department
	
End

To execute the procedure in SQL Server, enter the following code.

Exec Dbo.Multiple_Reseultset

To execute the procedure in SQL Server, enter the following code.

Exec Dbo.Multiple_Reseultset

PostgreSQL

Return multiple result sets from a procedure in PostgreSQL with the following code.

Create Or Replace Function Multiple_Reseultset() 
Returns Setof Refcursor As
$$
Declare
   cur1 Refcursor;
   cur2 Refcursor;
Begin 
   Open cur1 For
	Select * From HumanResources.employee;
	Return Next cur1;
	
	Open cur2 For
	Select * From HumanResources. Department;
	Return Next cur2;
End
$$ Language 'plpgsql';

To execute the procedure in PostgreSQL, enter the following code.

Begin
Select * From Public.Multiple_Reseultset( )
Fetch All In "<unnamed portal 1>"
Fetch All In "<unnamed portal 2>"
End

Inline queries with alias

PostgreSQL semantics may refer to inline views as Subselect or Subquery. Oracle supports omitting aliases for the inner statement. In PostgreSQL and SQL Server, the use of aliases is mandatory. The following code examples use B as an alias.

Oracle

The following code is a sample inline query in Oracle.

Select a.col1, col2_fromSubquery  -- you can specify the columns directly from the subquery with out any prefix of subquery unless have common columns names.
from emplyee a,
  (select * from salary ) 
where  active=true

SQL Server and PostgreSQL

The same sample inline queries written in Oracle requires an alias name in SQL Server and PostgreSQL.

Select a.col1, b.col2_fromSubquery
from emplyee a,
  (select * from salary ) b
where  active=true

Data order

After migrating data from either Oracle or SQL Server to PostgreSQL, the retrieval order of the data may vary. The reason could be either the order of insertion or the data type of the column and its values or collation.

To get the correct order of the data, identify the business need and apply the Order by clause on the query to match the data.

dblink and foreign data wrappers

dblink is the functionality used to communicate across homogeneous and heterogeneous databases. As of this post, Amazon RDS and Aurora PostgreSQL don’t offer heterogeneous support, but they do have support to communicate across the PostgreSQL databases.

Communicating across homogeneous databases

PostgreSQL support cross database communication with dblink and foreign data wrappers (FDWs) for cross-database communication. This section discusses how to use dblink and FDW.

Using foreign data wrappers

PostgreSQL supports FDWs, which you can use to access data stored in external sources. Amazon RDS and Aurora PostgreSQL support only PostgreSQL FDW currently. To configure the PostgreSQL FDW, complete the following steps.

  1. Create the extension with the following code.
    Create Extension postgres_fdw;
  2. Create the server and link to external databases with the following code.
    Create Server server_name1 Foreign Data Wrapper
    
    postgres_fdw Options (host abcd.rds.amazonaws.com' dbname abcd, port '5432');
  3. Create the user mapping to access the tables from an external database with the following code.
    Create User Mapping For Current_User
     Server server_name1
            Options  (user 'pgar1234', password 'pgar1234'); 
    

    Create user mapping for every user who would like to communicate via FDW.

  4. Import all the external tables into local schema to have access to the data from external tables just like regular tables accessed. Here is the sample code to import the tables from external database and schema.
    Create Schema imported_public2 -- created local schema
    	
    
    	Import Foreign Schema public From Server server_name1
         	Into imported_public2; -- This will import all the tables 
    
    	Select * From imported_public2.emptable

Communicating across heterogeneous databases

PostgreSQL doesn’t support cross-database communication. To have heterogeneous cross-database communication, Amazon Aurora PostgreSQL has limitations, but you can implement dblink on the source environment (for example, Oracle or SQL Server) to the target (PostgreSQL), and can either pull or push the data.

For more information, see Cross-Database Querying in Compose PostgreSQL.

Creating a view for a foreign database table with dblink

dblink is a PostgreSQL contrib extension that allows you to perform short ad hoc queries in other databases. With the dblink option, the user must provide and store the password in clear text, which is visible to users. This option is not recommended unless you have no other choice.

For more information, see Foreign Data Wrapper and postgres_fdw documentation.

Option 1: Provide target database access details in the SQL statement itself

In this option, the host connection and database credentials must be provided every time multiple places must change, such as any changes in the host or connection details.

Create Or Replace View emptable_dblink As
Select emptable.empid, emptable.last_name , emptable.first_name
   	From Dblink('host=abcd.rds.amazonaws.com user=abcd 
		 password=abcd dbname=abcd  port=5432',
		Select empid,last_name,first_name FROM emptable')
		AS emptable(empid Int,last_name Varchar , first_name Text );
		
Select * From emptable_dblink;

Option 2: Separate out access details and use connection object

In this option, host and connection details are defined at one place and use the connection name to have the cross-database connections.

Select Dblink_Connect('conName','dbname=abcd user=abcd 
		password=abcd host= abcd.rds.amazonaws.com ');
	
Create Or Replace View mytabview1 As
Select mytable.* From
    	Dblink('conName', Select empid,last_name,first_name FROM emptable')
As mytable(empid Int,last_name Varchar , first_name Text);
	
Select * From mytabview1;

Function call with dblink

The following code is a function from a foreign PostgreSQL database that returns an integer.

Select * From 
Dblink('host=abcd.rds.amazonaws.com user=abcd 
		 password=abcd dbname=postgres  port=5432',
		'Select public.add_ten(10)')
		As add_ten(a Int);

The following code is a function from a foreign PostgreSQL database that returns a table type.

Select Dblink_Connect('conName','dbname=pgar1234 user=pgar1234 
		password=pgar1234 host=pgar1234.ctegx79rcs0q.ap-south-1.rds.amazonaws.com');

Select Dblink_Open('conName','foo2',
		'Select * From public.tabletypetest(10)');
	
Select * From Dblink_Fetch('conName','foo2', 5) As (empid Int, last_name Varchar);

Finding the maximum and minimum value of a set of numbers

You may need maximum and minimum values when migrating to PostgreSQL. PostgreSQL includes a function to find these values, as demonstrated with the following code.

Select Greatest(1,2,3,50,100)
-> 100

Select Least(1,2,3,50,100) 
-> 1

Considering self-join for updates

Updates work differently in PostgreSQL compared to SQL Server if you are using the same source table (the table that is getting updated) in the from clause of select statement. In PostgreSQL, the second reference in the from clause is independent of first reference, unlike SQL Server, and the changes are applied to the entire table.

The following code example updates salaries for employees from Department 1.

Update employee 
		Set salary = employee.salary + employee.salary * 0.10
	From Employee e
	Join dept d on d.deptid = e.deptid
	Where d.deptid=1 

This function works the same in SQL Server, but when you migrate, the same SQL statement updates the entire table rather than a single department. PostgreSQL works differently, it assumes that the two employee tables are independent from each other, unlike SQL Server.

To update a single department, convert the DML to the following code.

Update Employee e
		Set salary = e.salary + e.salary * 0.10
	From dept d  
	Where d.deptid = e.deptid
 		And d.deptid=1 

If using Oracle, convert the DML to the following code.

Update Employee e
		Set Salary = e.salary + e.salary * 0.10
	Where Exists (Select 1 from dept d where d.deptid = e.deptid
 		And d.deptid=1 )

Summary

This post shared some tips and best practices for developers working on migrations from commercial databases to PostgreSQL. This post highlights many decisions you must make during the migration and how they can impact your database performance. Keeping these performance aspects in mind during the conversion can help avoid performance issues later on during migration.

If you have any questions or comments about this post, please share your thoughts in the comment section.

 

 


About the Author

Viswanatha Shastry Medipalli is a Consultant with the AWS ProServe team in India. His background spans a wide depth and breadth of expertise and experience in SQL database migrations. He has architected and designed many successful database solutions addressing challenging business requirements. He has provided solutions using Oracle, SQL Server and PostgreSQL for reporting, business intelligence, applications, and development support. He also has a good knowledge of automation, and orchestration. His focus area is homogeneous and heterogeneous migrations of on-premise databases to Amazon RDS and Aurora PostgreSQL.