AWS Database Blog
Migrating user-defined types from Oracle to PostgreSQL
Migrating from commercial databases to open source is a multistage process with different technologies, starting from assessment, data migration, data validation, and cutover. One of the key aspects for any heterogenous database migration is data type conversion. In this post, we show you a step-by-step approach to migrate user-defined types (UDT) from Oracle to Amazon Aurora PostgreSQL-Compatible Edition or Amazon Relational Database Service (Amazon RDS) for PostgreSQL. We also provide an overview of custom operators to use in SQL queries to access tables with UDT in PostgreSQL.
Migrating UDT from Oracle to Aurora PostgreSQL or Amazon RDS for PostgreSQL isn’t always straightforward, especially with UDT member functions. UDT defined in Oracle and PostgreSQL store structured business data in its natural form and work efficiently with applications using object-oriented programming techniques. UDT in Oracle can have both the data structure and the methods that operate on that data within the relational model. Though similar, the approaches to implement UDT in Oracle and PostgreSQL with member functions have subtle differences.
Overview
At a high level, migrating tables with UDT from Oracle to PostgreSQL involves following steps:
- Converting UDT – You can use the AWS Schema Conversion Tool (AWS SCT) to convert your existing database schema from one database engine to another. Unlike PostgreSQL, user-defined types in Oracle allow PL/SQL-based member functions to be a part of UDT. Because PostgreSQL doesn’t support member functions in UDT, you need to handle them separately during UDT conversion.
- Migrating data from tables with UDT – AWS Database Migration Service (AWS DMS) helps you migrate data from Oracle databases to Aurora PostgreSQL and Amazon RDS for PostgreSQL. However, as of this writing, AWS DMS doesn’t support UDT. This post explains using the open-source tool Ora2pg to migrate tables with UDT from Oracle to PostgreSQL.
Prerequisites
Before getting started, you must have the following prerequisites:
- The AWS SCT installed on a local desktop or an Amazon Elastic Compute Cloud (Amazon EC2) instance. For instructions, see Installing, verifying, and updating the AWS SCT.
- Ora2pg installed and set up on an EC2 instance. For instructions, see the Ora2pg installation guide. Ora2pg is an open-source tool distributed via GPLv3 license.
- EC2 instances used for Ora2pg and the AWS SCT should have connectivity to the Oracle source and PostgreSQL target databases.
Dataset
This post uses a sample dataset of a sporting event ticket management system. For this use case, the table DIM_SPORT_LOCATION_SEATS
with event location seating details has been modified to include location_t
as a UDT. location_t
has information of sporting event locations and seating capacity.
Oracle UDT location_t
The UDT location_t
has attributes describing sporting event location details, including an argument-based member function to compare current seating capacity of the location with expected occupancy for a sporting event. The function takes expected occupancy for the event as an argument and compares it to current seating capacity of the event location. It returns t
if the sporting event location has enough seating capacity for the event, and f
otherwise. See the following code:
Oracle table DIM_SPORT_LOCATION_SEATS
The following code shows the DDL for DIM_SPORT_LOCATION_SEATS
table with UDT location_t
in Oracle:
Converting UDT
Let’s start with the DDL conversion of location_t
and the table DIM_SPORT_LOCATION_SEATS
from Oracle to PostgreSQL.
You can use the AWS SCT to convert your existing database schema from Oracle to PostgreSQL. Because the target PostgreSQL database doesn’t support member functions in UDT, the AWS SCT ignores the member function during UDT conversion from Oracle to PostgreSQL. In PostgreSQL, we can create functions in PL/pgSQL with operators to have similar functionality as Oracle UDT does with member functions.
For this sample dataset, we can convert location_t
, to PostgreSQL using the AWS SCT. The AWS SCT doesn’t convert the DDL of member functions for location_t
from Oracle to PostgreSQL.
The following screenshot shows our SQL code.
PostgreSQL UDT location_t
The AWS SCT converts LOCATION_LEVELS
and LOCATION_SECTIONS
from the location_t
UDT to SMALLINT
for Postgres optimizations based on schema mapping rules. See the following code:
For more information about schema mappings, see Creating mapping rules in the AWS SCT.
Because PostgreSQL doesn’t support member functions in UDT, the AWS SCT ignores them while converting the DDL from Oracle to PostgreSQL. You need to write a PL/pgSQL function separately. In order to write a separate entity, you may need to add additional UDT object parameters to the member function. For our use case, the member function compare_seating_capacity
is rewritten as a separate PL/pgSQL function. The return data type for this function is bool
instead of varchar2
(in Oracle), because PostgreSQL provides a bool data type for true or false. See the following code:
The UDT conversion is complete yielding the PL/pgSQL function and the UDT in PostgreSQL. You can now create the DDL for tables using this UDT in the PostgreSQL target database using the AWS SCT, as shown in the following screenshot.
In the next section, we dive into migrating data from tables containing UDT from Oracle to PostgreSQL.
Migrating data from tables with UDT
In this section, we use the open-source tool Ora2pg to perform a full load of the DIM_SPORT_LOCATION_SEATS
table with UDT from Oracle to PostgreSQL. To install and set up Ora2pg on an EC2 instance, see the Ora2pg installation guide.
After installing Ora2pg, you can test connectivity with the Oracle source and PostgreSQL target databases.
To test the Oracle connection, see the following code:
To test the Aurora PG connection, see the following code:
You use a configuration file to migrate data from Oracle to PostgreSQL with Ora2pg. The following is the configuration file used for this sample dataset. Ora2pg has many options to copy and export different object types. In this example, we use COPY to migrate tables with UDT:
The configuration file has the following notable settings:
- SCHEMA – Sets the list of schemas to be exported as part of data migration.
- ALLOW – Provides a list of objects to migrate. Object names could be space- or comma-separated. You can also use regex like
DIM_*
to include all objects starting withDIM_
in thedms_sample
schema. - DROP_INDEXES – Improves data migration performance by dropping indexes before data load and recreating them in the target database post-data migration.
- TYPE – Provides an export type for data migration. For our use case, we’re migrating data to the target table using COPY statements. This parameter can only have a single value.
For more information about the available options in Ora2pg to migrate data from Oracle to PostgreSQL, see the Ora2pg documentation.
In the following code, we migrate the DIM_SPORT_LOCATION_SEATS
table from Oracle to PostgreSQL using the configuration file created previously:
The data from the DIM_SPORT_LOCATION_SEATS
table with UDT is now migrated to PostgreSQL. Setting search_path
in PostgreSQL allows dms_sample
to be the schema searched for objects referenced in SQL statements in this database session, without qualifying them with the schema name. See the following code:
Querying UDT in PostgreSQL
Now that both the DDL and data for the table DIM_SPORT_LOCATION_SEATS
are migrated to PostgreSQL, we can query the UDT using the newly created PL/pgSQL functions.
Querying Oracle with the UDT member function
The following code is an example of a SQL query to determine if any stadiums in Germany have a seating capacity of more than 75,000 people. The dataset provides seating capacity information of stadiums in Berlin and Munich:
The result of this SQL query shows that a stadium in Munich has sufficient seating capacity. However, the event location in Berlin doesn’t have enough seating capacity to host a sporting event of 75,000 people.
Querying PG with the PL/pgSQL function
The following code is the rewritten query in PostgreSQL, which uses the PL/pgSQL function COMPARE_SEATING_CAPACITY
to show the same results:
Using operators
You can also use PostgreSQL operators to simplify the previous query. Every operator is a call to an underlying function. PostgreSQL provides a large number of built-in operators for system types. For example, the built-in integer = operator has the underlying function as int4eq(int,int)
for two integers.
You can invoke built-in operators using the operator name or its underlying function. The following queries get sport location IDs with only two levels using the = operator and its built-in function int4eq
:
You can use operators to simplify the SQL query that finds stadiums in Germany with a seating capacity of more than 75,000 people. As shown in the following code, the operator >= takes the UDT location_t
as the left argument and integer
as the right argument to call the compare_seating_capacity
function. The COMMUTATOR clause, if provided, names an operator that is the commutator of the operator being defined. Operator X is the commutator of operator Y if (a X b) equals (b Y a) for all possible input values of a and b. In this case, <= acts as commutator to the operator >=. It’s critical to provide commutator information for operators that are used in indexes and join clauses because this allows the query optimizer to flip such a clause for different plan types.
The following PostgreSQL query with an operator shows the same results as the Oracle query with the UDT member function:
You can also use the operator >= in the where
clause with UDT location_t
, just like any other comparison operator. With the help of the user-defined operator >= defined earlier, the SQL query takes the location_t data type as the left argument and integer as the right argument. The following SQL query returns cities in Germany where seating capacity is more than 75,000.
Conclusion
This post showed you a solution to convert and migrate UDT with member functions from Oracle to PostgreSQL and how to use operators in queries with UDT in PostgreSQL. We hope that you find this post helpful. For more information about moving your Oracle workload to Amazon RDS for PostgreSQL or Aurora PostgreSQL, see Oracle Database 11g/12c To Amazon Aurora with PostgreSQL Compatibility (9.6.x) Migration Playbook.
As always, AWS welcomes feedback. If you have any comments or questions on this post, please share them in the comments.
About the Authors
Manuj Malik is a Senior Data Lab Solutions Architect at Amazon Web Services. Manuj helps customers architect and build databases and data analytics solutions to accelerate their path to production as part of AWS Data Lab. He has an expertise in database migration projects and works with customers to provide guidance and technical assistance on database services, helping them improve the value of their solutions when using AWS.
Devika Singh is a Solutions Architect at Amazon Web Services. Devika has expertise in database migrations to AWS and as part of AWS Data Lab, works with customers to design and build solutions in databases, data and analytics platforms.