AWS Database Blog
Extract and migrate data from nested tables with user-defined nested types from Oracle to PostgreSQL
As organizations seek to modernize their database infrastructure, many migrate their workloads from Oracle to PostgreSQL. However, migrations involve understanding the relationships between the different capabilities of each system, which may require taking additional steps to complete the transaction. This can slow down the migration. One example is the Oracle nested table feature, which supports multi-valued attributes through sub-tables. The differences between Oracle and PostgreSQL in their handling of user-defined types (UDTs) and associated member functions require careful consideration during the migration process due to conceptual differences.
Oracle offers 3 different ways to implement UDTs; we’ll explore two of them in this blog post. A single-level UDT is a basic data structure with no nested parts, making its migration straightforward. You can migrate a single-level UDT using the open source ora2pg tool; refer to Migrating user-defined types from Oracle to PostgreSQL. Migrating a multi-nested UDT is more complex because of its hierarchical structure, where one UDT can contain other UDTs as attributes. This nesting creates multiple layers of data that must be navigated and understood during migration.
In Oracle, UDTs can have member functions written in PL/SQL that are integrated directly into the UDT. In contrast, PostgreSQL currently doesn’t allow member functions within UDTs. As a result, when migrating UDTs from Oracle to PostgreSQL, you must separately handle these member functions and implement their functionality using different methods in PostgreSQL.
In this post, we dive deep into these differences and provide guidance for a smooth migration, helping ensure that the integrity of your data models is maintained throughout the process. We will also walk you through the details of converting complex member type functions in the multi-nested UDT from Oracle to PostgreSQL. After conversion is complete, we show how you can use AWS Database Migration Service (AWS DMS) to migrate the data for the tables that refer to the UDTs.
Prerequisites
To implement this solution, you will need the following prerequisites:
- An Amazon Elastic Compute Cloud (Amazon EC2) instance with an Oracle database.
- A properly sized AWS DMS replication instance. See Choosing the right AWS DMS replication instance for your migration.
- Source and target endpoints with connectivity to the source and target databases. See Working with AWS DMS Endpoints.
- A target Amazon Relational Database (Amazon RDS) for PostgreSQL-compatible instance.
While these resources are necessary for hands-on implementation, you can still follow along with the conceptual steps and processes described in this guide without having the actual environment set up.
In the following sections, we discuss two use cases with single layer nested and multiple layers of nested UDTs and show how to migrate the tables from Oracle to PostgreSQL.
- Scenario 1 – Entertainment company managing movie data
- Structure – Movie synopses with a nested
movie_synopses_list_t
- Structure – Movie synopses with a nested
- Scenario 2 – Movie artists data
- Structure – The Artists table has roles which in turn have a nested
artist_role_t
andartist_role_list_t
UDTs and location attributes. location attributes have nestedartist_local_t
andartist_local_list_t
- Structure – The Artists table has roles which in turn have a nested
Scenario 1 – Single layer nested UDT
In this scenario, we see how the single layer nested UDT data is extracted and transformed before migrating into PostgresSQL database table.
Table MOVIE_SYNOPSES_TBL
has column SYNOPSES, which is user defined data type. The following is the definition of the type and table involved.
Oracle definition of the UDT and table:
The following screenshot shows the data stored in the MOVIE_SYNOPSIS_TBL
in Oracle.
The following screenshot shows the data stored in the table column type MOVIE_SYNOPSIS_LIST_T
.
SQL query:
Output:
MOVIE_SPEC | RELEASE_KEY | HASH | CREATED | SYNOPSES (seqNo, language, region, synopsis) |
ABC | 657673673 | 4367727764 | 18-DEC-23 | MOVIE_SYNOPSIS_LIST_T (MOVIE_SYNOPSIS_T (1, ‘en’, ‘US’, ‘In a vast desert, explorers discover a hidden oasis. Mysterious ruins hold the key to ancient secrets. As they unravel the past, a malevolent force awakens. The oasis transforms into a perilous battleground. Teamwork is crucial as they face deadly traps and supernatural foes. Racing against time, they must find the legendary artifact, a beacon of hope. “Lost Oasis” is an adrenaline-fueled adventure, blending mystery, danger, and teamwork in a race against ancient forces.’)) |
FGH | 967986985 | 365556546 | 18-DEC-23 |
tordus.’)) |
We use AWS Schema Conversion Tool (AWS SCT) to convert existing schema objects from one engine to another, which in this case is from Oracle to PostgreSQL. In Scenario 1, we convert the tables and UDT using AWS SCT. AWS SCT ignores member functions as it currently doesn’t support converting those when PostgreSQL is target.
The following screenshot shows the SQL Code after SCT Conversion.
The following screenshot is for the table conversion using SCT.
After conversion, the Data Definition Language (DDL) looks like the following:
From the table definition, notice that synopses is a text field in the table movie_synopsis_tbl
and contains a brief summary of the movie that includes punctuation. Because of the punctuation in the values, it’s difficult to extract this data as array type with comma separated values, which is traditionally done using tools such as ora2pg. To extract the data, we can customize the data extraction using a combination of characters such as $^$ as a separator.
Use the following procedure to process the data from source table into intermediate table.
Create the stored procedure called MOVIE_SYNOPSIS_LOAD_V1 to extract the data from base nested table and transform into array and loaded to staging tables.
This procedure processes records from the MOVIE_SYNOPSIS_TBL
table, extracts and formats nested data (synopsis), and inserts the processed data into a staging table (MOVIE_SYNOPSIS_INT_TBL
).
It handles cases for empty arrays, single-element arrays, and multi-element arrays differently, ensuring the data is appropriately formatted before insertion.
Special characters like quotes and commas are escaped using REGEXP_REPLACE
to avoid issues during insertion.
Scenario 2 – Multiple layers of nested UDT
In this scenario, we look at extracting data from multi nested UDTs and loading into PostgreSQL table.
- SQL to create
movie_artist_t
object type:
- SQL to create a table type for
movie_artist_t
:
The following type has more columns, roles, and location attributes, which are further nested UDTs.
- SQL to create
artist_role_t
object type:
- SQL to create table type for artist_role_t:
- SQL to create
artist_local_t
object type:
- SQL to create table type for
artist_local_t
:
Table DDL:
The following screenshot shows data stored in the table MOVIE_ARTIST_TBL
in Oracle.
The following screenshot shows data stored in the field ARTISTS, which is defined as a custom data type.
The following screenshot shows the SQL Code after SCT conversion.
The following screenshot shows the SQL code for table conversion.
To create stored procedure called MOVIE_ARTISTS_INT_TBL_LOAD_V1
which processes data from a table called MOVIE_ARTIST_TBL
and inserts transformed data into a staging table called MOVIE_ARTIST_INT_TBL
use the following code.
Migrate data extracted using AWS DMS
Now, the extracted data from the intermediate staging table needs to be migrated to PostgreSQL tables.
We use AWS DMS to migrate the data as-is to a PostgreSQL database staging table, which is then further processed using the string_to_array
function and moved into the PostgreSQL main tables.
The following screenshots show the table data in the intermediate tables in the source after transformation using PL/SQL.
MOVIE_SYNOPSIS_INT_TBL
MOVIE_ARTIST_INT_TBL
Data in these tables is migrated using AWS DMS. Here is a sample JSON for the AWS DMS migration task setting.
After the data is moved to the staging tables through AWS DMS, it’s then loaded into the PostgreSQL database using string_to_array to cast the type as a custom data type as shown in the following.
For more complex nested types, we extract the data from the staging table into array format and then load it into the target table using exclusive type casting.
Using the preceding custom procedures, we load the data from staging tables into the actual tables in the target Amazon RDS for PostgreSQL database.
Clean up
To avoid incurring future charges, complete the following steps:
- Terminate the EC2 instance (if your source Oracle database was deployed on one).
- Clean up the AWS DMS tasks created.
- Delete the AWS DMS replication instance.
- Delete the RDS for PostgreSQL.
Conclusion
In this post, we discussed strategies for effectively extracting and transforming complex multi-nested UDTs from Oracle’s nested tables and migrating them to RDS for PostgreSQL. By addressing the intricacies highlighted in the problem statement—such as the hierarchical nature of multi-nested UDTs—we provided methods for managing and converting these data structures.
If you have any comments or questions, leave them in the comments section.
About the authors
Sujitha Sasikumaran is a Lead Database Consultant and Migration Expert with extensive experience in optimizing and transitioning complex database systems. Specializing in both traditional database management and modern cloud technologies, Sujitha helps organizations streamline their data environments and execute flawless migrations. Her role also involves integrating the latest advancements in data management to drive innovation and efficiency.
Tirumala Rama Chandra Murty Dasari is a Database Consultant with AWS Professional Services (Hyderabad, India). With extensive knowledge on relational and non-relational databases, he helps the customers migrate to the AWS Cloud. He’s curious to learn and implement new technologies, which is reflected in customer success. He uses relevant technologies to automate manual processes and enhance existing automation.