Migrating user-defined types from Oracle to PostgreSQL

Migrating from commercial databases to open source is a multistage process with different technologies, starting from assessment, data migration, data validation, and cutover. One of the key aspects for any heterogenous database migration is data type conversion. In this post, we show you a step-by-step approach to migrate user-defined types (UDT) from Oracle to Amazon Aurora PostgreSQL-Compatible Edition or Amazon Relational Database Service (Amazon RDS) for PostgreSQL. We also provide an overview of custom operators to use in SQL queries to access tables with UDT in PostgreSQL.

Migrating UDT from Oracle to Aurora PostgreSQL or Amazon RDS for PostgreSQL isn’t always straightforward, especially with UDT member functions. UDT defined in Oracle and PostgreSQL store structured business data in its natural form and work efficiently with applications using object-oriented programming techniques. UDT in Oracle can have both the data structure and the methods that operate on that data within the relational model. Though similar, the approaches to implement UDT in Oracle and PostgreSQL with member functions have subtle differences.

Overview

At a high level, migrating tables with UDT from Oracle to PostgreSQL involves following steps:

Converting UDT – You can use the AWS Schema Conversion Tool (AWS SCT) to convert your existing database schema from one database engine to another. Unlike PostgreSQL, user-defined types in Oracle allow PL/SQL-based member functions to be a part of UDT. Because PostgreSQL doesn’t support member functions in UDT, you need to handle them separately during UDT conversion.
Migrating data from tables with UDT – AWS Database Migration Service (AWS DMS) helps you migrate data from Oracle databases to Aurora PostgreSQL and Amazon RDS for PostgreSQL. However, as of this writing, AWS DMS doesn’t support UDT. This post explains using the open-source tool Ora2pg to migrate tables with UDT from Oracle to PostgreSQL.

Prerequisites

Before getting started, you must have the following prerequisites:

The AWS SCT installed on a local desktop or an Amazon Elastic Compute Cloud (Amazon EC2) instance. For instructions, see Installing, verifying, and updating the AWS SCT.
Ora2pg installed and set up on an EC2 instance. For instructions, see the Ora2pg installation guide. Ora2pg is an open-source tool distributed via GPLv3 license.
EC2 instances used for Ora2pg and the AWS SCT should have connectivity to the Oracle source and PostgreSQL target databases.

Dataset

This post uses a sample dataset of a sporting event ticket management system. For this use case, the table DIM_SPORT_LOCATION_SEATS with event location seating details has been modified to include location_t as a UDT. location_t has information of sporting event locations and seating capacity.

Oracle UDT location_t

The UDT location_t has attributes describing sporting event location details, including an argument-based member function to compare current seating capacity of the location with expected occupancy for a sporting event. The function takes expected occupancy for the event as an argument and compares it to current seating capacity of the event location. It returns t if the sporting event location has enough seating capacity for the event, and f otherwise. See the following code:

create or replace type location_t as object (
    LOCATION_NAME             VARCHAR2 (60 ) ,
    LOCATION_CITY             VARCHAR2 (60 ),
    LOCATION_SEATING_CAPACITY NUMBER (7) ,
    LOCATION_LEVELS           NUMBER (1) ,
    LOCATION_SECTIONS         NUMBER (4) ,
    MEMBER FUNCTION COMPARE_SEATING_CAPACITY(capacity in number) RETURN VARCHAR2
);
/

create or replace type body location_t is
    MEMBER FUNCTION COMPARE_SEATING_CAPACITY(capacity in number) RETURN VARCHAR2 is
    seat_capacity_1 number ;
    seat_capacity_2 number ;
  begin
      if ( LOCATION_SEATING_CAPACITY is null ) then 
          seat_capacity_1 := 0;
          else
          	seat_capacity_1 := LOCATION_SEATING_CAPACITY;
      end if;
      if ( capacity is null ) then
          seat_capacity_2 := 0;
          else
          	seat_capacity_2 := capacity;
      end if;
      if seat_capacity_1 >= seat_capacity_2 then
      	return 't';
      	else
      		return 'f';
      end if;
  end COMPARE_SEATING_CAPACITY;
end;
/

Oracle table DIM_SPORT_LOCATION_SEATS

The following code shows the DDL for DIM_SPORT_LOCATION_SEATS table with UDT location_t in Oracle:

CREATE TABLE DIM_SPORT_LOCATION_SEATS
  (
    SPORT_LOCATION_SEAT_ID    NUMBER NOT NULL ,
    SPORT_LOCATION_ID         NUMBER (3) NOT NULL ,
    LOCATION                  location_t,
    SEAT_LEVEL                NUMBER (1) NOT NULL ,
    SEAT_SECTION              VARCHAR2 (15) NOT NULL ,
    SEAT_ROW                  VARCHAR2 (10 BYTE) NOT NULL ,
    SEAT_NO                   VARCHAR2 (10 BYTE) NOT NULL ,
    SEAT_TYPE                 VARCHAR2 (15 BYTE) ,
    SEAT_TYPE_DESCRIPTION     VARCHAR2 (120 BYTE) ,
    RELATIVE_QUANTITY         NUMBER (2)
  ) ;

Converting UDT

Let’s start with the DDL conversion of location_t and the table DIM_SPORT_LOCATION_SEATS from Oracle to PostgreSQL.

You can use the AWS SCT to convert your existing database schema from Oracle to PostgreSQL. Because the target PostgreSQL database doesn’t support member functions in UDT, the AWS SCT ignores the member function during UDT conversion from Oracle to PostgreSQL. In PostgreSQL, we can create functions in PL/pgSQL with operators to have similar functionality as Oracle UDT does with member functions.

For this sample dataset, we can convert location_t, to PostgreSQL using the AWS SCT. The AWS SCT doesn’t convert the DDL of member functions for location_t from Oracle to PostgreSQL.

The following screenshot shows our SQL code.

PostgreSQL UDT location_t

The AWS SCT converts LOCATION_LEVELS and LOCATION_SECTIONS from the location_t UDT to SMALLINT for Postgres optimizations based on schema mapping rules. See the following code:

create TYPE location_t as (
    LOCATION_NAME             CHARACTER VARYING(60)  ,
    LOCATION_CITY             CHARACTER VARYING(60) ,
    LOCATION_SEATING_CAPACITY INTEGER  ,
    LOCATION_LEVELS           SMALLINT  ,
    LOCATION_SECTIONS         SMALLINT 
);

For more information about schema mappings, see Creating mapping rules in the AWS SCT.

Because PostgreSQL doesn’t support member functions in UDT, the AWS SCT ignores them while converting the DDL from Oracle to PostgreSQL. You need to write a PL/pgSQL function separately. In order to write a separate entity, you may need to add additional UDT object parameters to the member function. For our use case, the member function compare_seating_capacity is rewritten as a separate PL/pgSQL function. The return data type for this function is bool instead of varchar2 (in Oracle), because PostgreSQL provides a bool data type for true or false. See the following code:

CREATE or REPLACE FUNCTION COMPARE_SEATING_CAPACITY (event_loc_1 location_t,event_loc_2 integer) RETURNS bool AS
$$
declare
seat_capacity_1 integer;
seat_capacity_2 integer ;
begin
if ( event_loc_1.LOCATION_SEATING_CAPACITY is null ) then 
         seat_capacity_1 = 0 ;
else
	seat_capacity_1 = event_loc_1.LOCATION_SEATING_CAPACITY;
end if;
if ( event_loc_2 is null ) then 
         seat_capacity_2 = 0 ;
else 
	seat_capacity_2 = event_loc_2 ;
end if;
if seat_capacity_1 >= seat_capacity_2 then
	return true;
else
	return false;
end if;
end;
$$ LANGUAGE plpgsql;

The UDT conversion is complete yielding the PL/pgSQL function and the UDT in PostgreSQL. You can now create the DDL for tables using this UDT in the PostgreSQL target database using the AWS SCT, as shown in the following screenshot.

In the next section, we dive into migrating data from tables containing UDT from Oracle to PostgreSQL.

Migrating data from tables with UDT

In this section, we use the open-source tool Ora2pg to perform a full load of the DIM_SPORT_LOCATION_SEATS table with UDT from Oracle to PostgreSQL. To install and set up Ora2pg on an EC2 instance, see the Ora2pg installation guide.

After installing Ora2pg, you can test connectivity with the Oracle source and PostgreSQL target databases.

To test the Oracle connection, see the following code:

-bash-4.2$ cd $ORACLE_HOME/network/admin
-bash-4.2$ echo "oratest=(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = oratest.xxxxxxx.us-west-2.rds.amazonaws.com )(PORT =1526))(CONNECT_DATA =(SERVER = DEDICATED)    (SERVICE_NAME = UDTTEST)))" >> tnsnames.ora
-bash-4.2$ sqlplus username/password@oratest

SQL*Plus: Release 11.2.0.4.0 Production on Fri Aug 7 05:05:35 2020

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL>

To test the Aurora PG connection, see the following code:

-bash-4.2$ psql -h pgtest.xxxxxxxx.us-west-2.rds.amazonaws.com -p 5436 -d postgres master
Password for user master:
psql (9.2.24, server 11.6)
WARNING: psql version 9.2, server version 11.0.
         Some psql features might not work.
SSL connection (cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256)
Type "help" for help.

postgres=>

You use a configuration file to migrate data from Oracle to PostgreSQL with Ora2pg. The following is the configuration file used for this sample dataset. Ora2pg has many options to copy and export different object types. In this example, we use COPY to migrate tables with UDT:

-bash-4.2$ cat ora2pg_for_copy.conf
ORACLE_HOME     /usr/lib/oracle/11.2/client64
ORACLE_DSN      dbi:Oracle:sid=oratest
ORACLE_USER     master
ORACLE_PWD      xxxxxxx
DEBUG           1
EXPORT_SCHEMA   1
SCHEMA          dms_sample
CREATE_SCHEMA   0
COMPILE_SCHEMA  0
PG_SCHEMA
TYPE            COPY
PG_DSN          dbi:Pg:dbname=postgres;host=pgtest.xxxxxxxxx.us-west-2.rds.amazonaws.com;port=5436
PG_USER         master
PG_PWD          xxxxxxxx
ALLOW           DIM_SPORT_LOCATION_SEATS
BZIP2
DATA_LIMIT      400
BLOB_LIMIT      100
LONGREADLEN6285312
LOG_ON_ERROR
PARALLEL_TABLES 1
DROP_INDEXES    1
WITH_OID        1
FILE_PER_TABLE

The configuration file has the following notable settings:

SCHEMA – Sets the list of schemas to be exported as part of data migration.
ALLOW – Provides a list of objects to migrate. Object names could be space- or comma-separated. You can also use regex like DIM_* to include all objects starting with DIM_ in the dms_sample schema.
DROP_INDEXES – Improves data migration performance by dropping indexes before data load and recreating them in the target database post-data migration.
TYPE – Provides an export type for data migration. For our use case, we’re migrating data to the target table using COPY statements. This parameter can only have a single value.

For more information about the available options in Ora2pg to migrate data from Oracle to PostgreSQL, see the Ora2pg documentation.

In the following code, we migrate the DIM_SPORT_LOCATION_SEATS table from Oracle to PostgreSQL using the configuration file created previously:

-bash-4.2$ ora2pg -c ora2pg_for_copy.conf -d
Ora2Pg version: 18.1
Trying to connect to database: dbi:Oracle:sid=oratest
Isolation level: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
Retrieving table information...
[1] Scanning table DIM_SPORT_LOCATION_SEATS (2 rows)...
Trying to connect to database: dbi:Oracle:sid=oratest
Isolation level: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
Retrieving partitions information...
Dropping indexes of table DIM_SPORT_LOCATION_SEATS...
Looking how to retrieve data from DIM_SPORT_LOCATION_SEATS...
Data type LOCATION_T is not native, searching on custom types.
    Found Type: LOCATION_T
Looking inside custom type LOCATION_T to extract values...
Fetching all data from DIM_SPORT_LOCATION_SEATS tuples...
Dumping data from table DIM_SPORT_LOCATION_SEATS into PostgreSQL...
Setting client_encoding to UTF8...
Disabling synchronous commit when writing to PostgreSQL...
DEBUG: Formatting bulk of 400 data for PostgreSQL.
DEBUG: Creating output for 400 tuples
DEBUG: Sending COPY bulk output directly to PostgreSQL backend
Extracted records from table DIM_SPORT_LOCATION_SEATS: total_records = 2 (avg: 2 recs/sec)
[========================>] 2/2 total rows (100.0%) - (1 sec., avg: 2 recs/sec).
Restoring indexes of table DIM_SPORT_LOCATION_SEATS...
Restarting sequences

The data from the DIM_SPORT_LOCATION_SEATS table with UDT is now migrated to PostgreSQL. Setting search_path in PostgreSQL allows dms_sample to be the schema searched for objects referenced in SQL statements in this database session, without qualifying them with the schema name. See the following code:

postgres=> set search_path=dms_sample;
SET

postgres=> select sport_location_seat_id,location,seat_level,seat_section,seat_row,seat_no from DIM_SPORT_LOCATION_SEATS;
 sport_location_seat_id |          location          | seat_level | seat_section | seat_row | seat_no
------------------------+----------------------------+------------+--------------+----------+---------
                      1 | (Germany,Munich,75024,2,3) |          3 | S            | 2        | S-8
                      1 | (Germany,Berlin,74475,2,3) |          3 | S            | 2        | S-8
(2 rows)

Querying UDT in PostgreSQL

Now that both the DDL and data for the table DIM_SPORT_LOCATION_SEATS are migrated to PostgreSQL, we can query the UDT using the newly created PL/pgSQL functions.

Querying Oracle with the UDT member function

The following code is an example of a SQL query to determine if any stadiums in Germany have a seating capacity of more than 75,000 people. The dataset provides seating capacity information of stadiums in Berlin and Munich:

SQL> select t.location.LOCATION_CITY CITY,t.LOCATION.COMPARE_SEATING_CAPACITY(75000) SEATS_AVAILABLE from DIM_SPORT_LOCATION_SEATS  t where t.location.LOCATION_NAME='Germany';

CITY							     SEATS_AVAILABLE
---------------------------------	 ----------------
Munich							     t
Berlin							     f

The result of this SQL query shows that a stadium in Munich has sufficient seating capacity. However, the event location in Berlin doesn’t have enough seating capacity to host a sporting event of 75,000 people.

Querying PG with the PL/pgSQL function

The following code is the rewritten query in PostgreSQL, which uses the PL/pgSQL function COMPARE_SEATING_CAPACITY to show the same results:

postgres=> select (location).LOCATION_CITY,COMPARE_SEATING_CAPACITY(location,75000) from DIM_SPORT_LOCATION_SEATS where (location).LOCATION_NAME='Germany';
 location_city | compare_seating_capacity
---------------+--------------------------
 Munich        | t
 Berlin        | f
(2 rows)

Using operators

You can also use PostgreSQL operators to simplify the previous query. Every operator is a call to an underlying function. PostgreSQL provides a large number of built-in operators for system types. For example, the built-in integer = operator has the underlying function as int4eq(int,int) for two integers.

You can invoke built-in operators using the operator name or its underlying function. The following queries get sport location IDs with only two levels using the = operator and its built-in function int4eq:

postgres=> select sport_location_id,(location).location_levels from DIM_SPORT_LOCATION_SEATS where (location).location_levels =  2;
 sport_location_id | location_levels
-------------------+-----------------
                 2 |               2
                 3 |               2
(2 rows)

postgres=> select sport_location_id,(location).location_levels from DIM_SPORT_LOCATION_SEATS where int4eq((location).location_levels,2);
 sport_location_id | location_levels
-------------------+-----------------
                 2 |               2
                 3 |               2
(2 rows)

You can use operators to simplify the SQL query that finds stadiums in Germany with a seating capacity of more than 75,000 people. As shown in the following code, the operator >= takes the UDT location_t as the left argument and integer as the right argument to call the compare_seating_capacity function. The COMMUTATOR clause, if provided, names an operator that is the commutator of the operator being defined. Operator X is the commutator of operator Y if (a X b) equals (b Y a) for all possible input values of a and b. In this case, <= acts as commutator to the operator >=. It’s critical to provide commutator information for operators that are used in indexes and join clauses because this allows the query optimizer to flip such a clause for different plan types.

CREATE OPERATOR >= (
    LEFTARG = location_t,
    RIGHTARG = integer,
    PROCEDURE = COMPARE_SEATING_CAPACITY,
    COMMUTATOR = <=
);

The following PostgreSQL query with an operator shows the same results as the Oracle query with the UDT member function:

postgres=> select (location).LOCATION_CITY CITY,(location).LOCATION_SEATING_CAPACITY >=75000 from DIM_SPORT_LOCATION_SEATS where (location).LOCATION_NAME='Germany';
  city  | ?column?
--------+----------
 Munich | t
 Berlin | f
(2 rows)

You can also use the operator >= in the where clause with UDT location_t, just like any other comparison operator. With the help of the user-defined operator >= defined earlier, the SQL query takes the location_t data type as the left argument and integer as the right argument. The following SQL query returns cities in Germany where seating capacity is more than 75,000.

postgres=> select (location).LOCATION_CITY from DIM_SPORT_LOCATION_SEATS where (location).LOCATION_NAME='Germany' and location >=75000;
 location_city
---------------
 Munich
(1 row)

Conclusion

This post showed you a solution to convert and migrate UDT with member functions from Oracle to PostgreSQL and how to use operators in queries with UDT in PostgreSQL. We hope that you find this post helpful. For more information about moving your Oracle workload to Amazon RDS for PostgreSQL or Aurora PostgreSQL, see Oracle Database 11g/12c To Amazon Aurora with PostgreSQL Compatibility (9.6.x) Migration Playbook.

As always, AWS welcomes feedback. If you have any comments or questions on this post, please share them in the comments.

About the Authors

Manuj Malik is a Senior Data Lab Solutions Architect at Amazon Web Services. Manuj helps customers architect and build databases and data analytics solutions to accelerate their path to production as part of AWS Data Lab. He has an expertise in database migration projects and works with customers to provide guidance and technical assistance on database services, helping them improve the value of their solutions when using AWS.

Devika Singh is a Solutions Architect at Amazon Web Services. Devika has expertise in database migrations to AWS and as part of AWS Data Lab, works with customers to design and build solutions in databases, data and analytics platforms.