AWS Database Blog
Migrate Oracle hierarchical queries to Amazon Aurora PostgreSQL
We have seen a number of organizations are migrating their database workloads from commercial database engines to the Amazon Aurora database environment. These organizations have reduced their overall efforts on common database administration tasks, data center maintenance, and have moved away from proprietary database features and commercial licenses.
AWS provides the AWS Schema Conversion Tool (AWS SCT) which simplifies schema conversion for heterogeneous database migration. AWS SCT reports objects and SQL requiring manual efforts for conversion. Hierarchical queries conversion requires additional efforts to convert, validate, and test in comparison to objects automatically converted by AWS SCT.
In this post, we show you how to migrate different hierarchical queries from Oracle to Amazon Aurora PostgreSQL-Compatible Edition using recursive queries in common table expressions (CTE). We also look at a few limitations of the tablefunc extension supported by Aurora PostgreSQL.
Hierarchical queries
In Oracle, hierarchical queries are used to query data that has a parent-child relationship where each child can have only one parent, whereas a parent can have multiple children. This is very useful when trying to build reporting queries such as product lineage, manager reports, and a family tree. A hierarchical query displays organized rows in a tree structure, so in order to retrieve the data, it has to be traversed starting from the root. The following diagram illustrates a sample tree structure.
In the above diagram, the first node at the top of the hierarchy, A1, is called the root node, and the rest of the nodes such as B1, B2, and B3 are the children nodes. If a query needs to find the hierarchy of node D1, it will scan the tree from A1 to B3 and then to C2, traversing down to D1.
Prerequisites
We use the following sample table and data throughout our examples in this post.
Oracle | PostgreSQL |
Although PostgreSQL doesn’t have functions or predefined keywords to handle the hierarchical queries directly, you can define custom solutions with the help of the tablefunc extension and CTEs. Tablefunc
is useful for hierarchical queries with connect by
and level keywords, but with CTE we can support various types of keywords such as LEVEL
, NOCYCLE
, SYS_CONNECT_BY_PATH
, ORDER SIBLINGS BY
, CONNECT_BY_ISLEAF
, and CONNECT_BY_ROOT
of hierarchical queries. We are going to discuss and look at these scenarios in details in the following sections.
tablefunc extension
The tablefunc
extension has a function called connectby
, which produces a display of hierarchical data that is stored in a table.
For the connectby
function to work, the table needs the following:
- A key field that uniquely identifies rows
- A parent-key field that references the parent (if any) of each row
This function can display the sub-tree descending from a row. The primary use case of this function is to display parent-child connections (hierarchy data).
The following code demonstrates our PostgreSQL query:
The following screenshot shows our output.
The connectby
function works best when only the parent, child, and level attributes are selected. For example, in the following query, fetching the ename
attribute results in an error:
We get the following output.
Additionally, connectby
cannot be used with hierarchical queries built using keywords like NOCYCLE
, CONNECT_BY_ISLEAF
, SYS_CONNECT_BY_PATH
, ORDER SIBLINGS BY
, and CONNECT_BY_ROOT
.
Recursive queries
We can achieve hierarchical queries using CTE recursive queries. A recursive query is one that is defined by a UNION ALL
with an initialization fullselect
that seeds the recursion. The iterative (recursive) fullselect
contains a direct reference to itself in the FROM
clause. See the following code:
Let’s go through various use cases of Oracle hierarchical queries and achieve similar functionality using recursive SQL in PostgreSQL.
Scenario 1: Display employee level along with other details
The keyword Level
describes the depth of node in the hierarchy. The first level is the root of hierarchy.
The following code shows our Oracle query:
The following screenshot shows our results.
In PostgreSQL, the non-recursive part generates the root of the hierarchy (top-down), which is the employee with no manager ( manager_no is null
) or with a specific manager (manager_n = 10
). The recursive part generates the hierarchy by joining the main table with the output of the non-recursive query until the join condition (e.manager_no = c.emp_no
) is true. See the following PostgreSQL query:
We get the following output.
Scenario 2: Hierarchical queries with SYS_CONNECT_BY_PATH
The keyword SYS_CONNECT_BY_PATH returns the path of a column value from root to node, with column values separated by char(delimiter) for each row returned by the CONNECT BY condition.
Use the following Oracle query:
We get the following results.
In PostgreSQL, we can achieve a functionality similar to SYS_CONNECT_BY_PATH
by concatenating the parent and child record attributes with a char/delimiter in every iteration. See the following code:
We get the following output.
Scenario 3: Hierarchical queries with NOCYCLE
The NOCYCLE
parameter instructs Oracle Database to return rows from a query even if a CONNECT BY LOOP
exists in the data.
If there is data in which the child is the parent and the parent is the child, then the hierarchical query goes into a data loop. The NOCYCLE
keyword helps us avoid this loop.
To create a cycle in our data, we added a record of another employee as emp_no
13 with their manager as emp_no
14. With our new record, the sample data looks like the following screenshot.
When you run the SQL code from the previous scenario in Oracle without adding NOCYCLE
, you encounter an error:
The following screenshot shows our output.
Adding the NOCYCLE
keyword in the Oracle query gives you expected results. See the following code:
We get the following results.
In PostgreSQL, we use two fields, route
and cycle
, to achieve functionality similar to NOCYCLE
. route
is an array of already visited values and cycle
is a flag that gets set based on if a value is already present in route
and the condition cycle = false
filters out cyclic records. See the following code:
The following screenshot shows our output.
Scenario 4: Hierarchical queries with ORDER SIBLINGS BY
The optional SIBLINGS keyword specifies an order that first sorts the parent rows, then sorts the child rows of each parent for each level within the hierarchy. See the following Oracle query:
The following screenshot shows our results.
In PostgreSQL, we can achieve a functionality similar to ORDER BY SIBLINGS
by ordering the CTE output by path. The path is a concatenation of attributes mentioned in the ORDER BY clause in the Oracle query.
In the following PostgreSQL query, the path attribute has ename
(emp_no
is optionally included to handle scenarios of different emp_no
values with the same ename
under the same manager):
We get the following output.
Scenario 5: Hierarchical queries with CONNECT_BY_ISLEAF
The CONNECT_BY_ISLEAF pseudo column returns 1
if the current row is a leaf of the tree defined by the CONNECT BY condition. Otherwise it returns 0
. This information indicates whether a given row can be further expanded to show more of the hierarchy.
Use the following Oracle query:
We get the following results.
In PostgreSQL, we can achieve a functionality similar to CONNECT_BY_ISLEAF
by checking whether the child node is a part of the parent nodes returned by the CTE or not. See the following code:
The following screenshot shows our output.
Scenario 6: Hierarchical queries with CONNECT_BY_ROOT
CONNECT_BY_ROOT is a unary operator that is valid only in hierarchical queries. When you qualify a column with this operator, Oracle returns the column value using data from the root row. See the following query:
We get the following results.
In PostgreSQL, we can achieve functionality similar to CONNECT_BY_ROOT
by using substr
or a split function on the path attributed in the CTE output:
We get the following output.
Considerations
Consider the following when using this solution in your environment:
- The dataset we used for this post is simple in nature and may not reflect the data complexity of your environment
- Not all scenarios or keywords of Oracle’s hierarchical queries are discussed in this post
- Test the solution and queries you’re going to build referencing the sample queries for functional and performance requirements
Conclusion
In this post, we demonstrated via sample queries how you can migrate Oracle hierarchical queries using keywords LEVEL
, NOCYCLE
, SYS_CONNECT_BY_PATH
, ORDER SIBLINGS BY
, CONNECT_BY_ISLEAF
, and CONNECT_BY_ROOT
to PostgreSQL. We also talked about use cases and shortcomings of the tablefunc
extension when migrating Oracle hierarchical queries.
Check out Database Migration—What Do You Need to Know Before You Start? to get started. Also review the recommended best practices, including the migration process and infrastructure considerations, source database considerations, and target database considerations for the PostgreSQL environment.
If you have any questions, comments, or other feedback, share your thoughts on the Amazon Aurora Discussion Forums.
About the Authors
Rakesh Raghav is a Database Specialist with the AWS Professional Services in India, helping customers have a successful cloud adoption and migration journey. He is passionate about building innovative solutions to accelerate their database journey to cloud.
Anuradha Chintha is a Lead Consultant with Amazon Web Services. She works with customers to build scalable, highly available, and secure solutions in the AWS Cloud. Her focus area is homogeneous and heterogeneous database migrations.