AWS Database Blog

Migrate Oracle bulk binds to Amazon Aurora PostgreSQL-Compatible Edition or Amazon RDS for PostgreSQL

To migrate an Oracle database to Amazon Aurora PostgreSQL-Compatible Edition, you usually need to perform both automated and manual tasks. The automated tasks include schema conversion and data migration, which can be handled with the AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (AWS DMS), respectively. The manual tasks involve post-schema AWS SCT migration touch-ups for certain database objects that can’t be migrated automatically.

The AWS SCT automatically converts the source database schema and a majority of the custom code to a format compatible with the target database. In a database migration from Oracle to PostgreSQL, the AWS SCT automates the conversion of Oracle PL/SQL code to equivalent PL/pgSQL code in PostgreSQL.

When migrating from Oracle to PostgreSQL, it’s very common to come across bulk binds (such as BULK COLLECT and FORALL). A bulk bind is a feature in Oracle databases that allows multiple rows of data to be fetched or processed at once, rather than one row at a time. It works by binding a collection of values to a single variable, which is then processed by the database in a single operation.

In this post, we provide a brief overview of how Oracle bulk binds work, and show you how to migrate them to a database running on Amazon Aurora PostgreSQL-Compatible Edition. We also review some challenges you may encounter during the conversion and how you can resolve them.

Solution overview

Oracle PL/SQL allows you to fetch records in bulk rather than one by one. BULK COLLECT reduces context switches between the SQL and PL/SQL engine and allows the SQL engine to fetch the records all at once. The advantage of using BULK COLLECT is that it increases performance by reducing the interaction between the database and PL/SQL engine.

PostgreSQL provides the array function ARRAY_AGG, which you can use to have a similar processing logic to Oracle. In this post, we discuss different approaches of using BULK COLLECT and how to migrate the same into PostgreSQL. We also discuss common mistakes and solutions while using ARRAY_AGG as an alternative to BULK COLLECT.

Prerequisites

To get started with the solutions described in this post, you need following:

  • An active AWS account
  • A source Oracle database (on premises or Amazon RDS for Oracle)
  • A target Aurora PostgreSQL database
  • A database user with the following privileges on the target database (replace YOUR_SCHEMA_NAME and YOUR_DB_USER with your schema and user, respectively):
GRANT CONNECT ON DATABASE TO YOUR_DB_USER;
GRANT USAGE ON SCHEMA YOUR_SCHEMA_NAME TO YOUR_DB_USER;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA YOUR_SCHEMA_NAME TO YOUR_DB_USER;

Use the following code to create the sample tables in Oracle and PostgreSQL and insert data into them.

The following is the Oracle code:

create table test_table as WITH t(n) AS (
	SELECT 1 from dual
	UNION ALL
	SELECT n+1 FROM t WHERE n < 5000)
	
SELECT n as id,'test_'||n as name ,sysdate+n as login_date FROM t;

The following is the PostgreSQL code:

CREATE TABLE test_table AS 
			SELECT n as id,
				   'test_'||n as name, 
				   CURRENT_DATE +n as login_date 
			FROM generate_series(1, 5000) as n;

The following sections demonstrate how BULK COLLECT works in Oracle and how to migrate functions using BULK COLLECT into PostgreSQL.

BULK COLLECT using FETCH

The FETCH statement is used to retrieve a single row of data from a cursor. When used with BULK COLLECT, the FETCH statement retrieves multiple rows of data at once.

The following PL/SQL program declares a cursor and an associate array, and demonstrates how to use BULK COLLECT to retrieve all rows in a single FETCH:

SET SERVEROUTPUT ON
 
DECLARE
 
  /* Declaring the collection type */
 
  TYPE t_bulk_collect_test_tab IS TABLE OF test_table%ROWTYPE;
 
  /* Declaring the collection variable */
 
  l_tab t_bulk_collect_test_tab;
 
  CURSOR c_data IS SELECT * FROM test_table;
 
BEGIN
 
  /* Populate the array using BULK COLLECT that retrieves all rows in a single FETCH ,
     getting rid of row by row fetch in a loop */
 
  OPEN c_data;
  FETCH c_data BULK COLLECT INTO l_tab;
  CLOSE c_data;
 
  -- Process contents of collection here.
  DBMS_OUTPUT.put_line(l_tab.count || ' rows');
 
/* Accessing the collection type - Before Modify */
    FOR i IN l_tab.FIRST .. l_tab.LAST 
    LOOP
       EXIT WHEN i = 3;
       dbms_output.put_line('Before Modify- Row- '|| i || ': is '||l_tab(i).name);
       dbms_output.put_line('Before Modify- Row- '|| i || ': is '||l_tab(i).name);
    END LOOP;
 
/* Modifying collection element values */
    l_tab(2).name := 'Change Me';
 
/* Accessing the collection type – After Modify */
    FOR i IN l_tab.FIRST .. l_tab.LAST 
    LOOP
       EXIT WHEN i = 3;
       dbms_output.put_line('After Modify- Row- '|| i || ': is '||l_tab(i).name);
    END LOOP;
 
   dbms_output.put_line('Program executed successfully.');
 
END;

We get the following output.

Now let’s convert the Oracle example to the equivalent PL/pgSQL code.

PL/pgSQL in PostgreSQL doesn’t currently have the BULK COLLECT syntax. However, we can use the ARRAY_AGG aggregate function to run queries that perform bulk processing.

The first step is to declare an array of a type that matches the Oracle collection type. To do that, you have two options:

  • Create a domain in PostgreSQL that is a type similar to the collection type in Oracle (for example, CREATE DOMAIN l_tab AS CHARACTER VARYING[])
  • Directly declare an array variable in PL/pgSQL code (for example, l_tab CHARACTER VARYING[])

In the following example, we use ARRAY_AGG to fetch c_data cursor data into the l_tab variable to mimic the bulk processing functionality of BULK COLLECT in Oracle by directly declaring an array variable in PL/pgSQL code:

DO $$
Declare

    /* l_tab variable declaration of Array type ,this is same as collection type in Oracle */

    l_tab    test_table[];

    /* rec is a record type variable declaration of table type of test_table  */
	
	i INTEGER := 1;
	rec RECORD ;
	
BEGIN
 
  /* Removed BULK COLLECT in PostgreSQL while migrating code from Oracle PLSQL. 
     modified with Array_agg. */

	SELECT array_agg((id,name,login_date)::test_table) into l_tab FROM   test_table;

    RAISE NOTICE 'Bulk count: (%',  COALESCE(array_length(l_tab,1),0)|| ' rows): ' ;
  
  /* Accessing the collection type - Before Modify */

    FOREACH rec IN ARRAY l_tab 
    LOOP
		 EXIT WHEN i =3;
         RAISE NOTICE using message := concat_ws('', 'Before Modify- Row- ', i, 
                                                    ': is ', rec.name
                                             );
		  i := i + 1 ;
    END LOOP;
 
    /* Modifying collection element values */

     rec := l_tab[2] ;
     rec.name := 'Change Me';
     l_tab[2] := rec;
 
	i := 1;
	
    /* Accessing the collection type – After Modify */

    FOREACH rec IN ARRAY l_tab 
    LOOP
		 EXIT WHEN i =3;
	      RAISE NOTICE USING message := concat_ws('', 'After Modify- Row- ', i, 
                                                    ': is ', rec.name
                                             );
		  i := i + 1 ;
		  
    END LOOP;
    
    RAISE NOTICE 'Program executed successfully.';

END $$; 

The example produces the following output.

It’s recommended to convert Oracle global objects to domains in PostgreSQL. This is because Oracle global objects are accessible publicly across all the procedures and functions inside the package. Similarly, in PostgreSQL, we create a domain to replicate the similar functionality as an Oracle global array.

BULK COLLECT in a SELECT statement

The previous section demonstrated how to migrate PL/SQL code using the BULK COLLECT feature with FETCH to its equivalent constructs in PL/pgSQL. Now let’s explore the different aspects of the BULK COLLECT mechanism and the steps required to migrate using a SELECT statement.

The following is an example of an Oracle function that uses BULK COLLECT in a SELECT statement:

SET SERVEROUTPUT ON
DECLARE
  TYPE t_bulk_collect_test_tab IS TABLE OF test_table%ROWTYPE;
  l_tab    t_bulk_collect_test_tab := t_bulk_collect_test_tab();
BEGIN
 
  /* Populate the array using BULK COLLECT that retrieves all rows in a single FETCH ,
     Using SELECT INTO CLAUSE */
 
  SELECT *
  BULK COLLECT INTO l_tab
  FROM   test_table;
 
  DBMS_OUTPUT.put_line('Bulk count: (' || l_tab.count || ' rows): ' );
END;
/

The example produces the following output.

In PostgreSQL, we can use the ARRAY_AGG function to achieve the bulk SELECT INTO statement functionality:

/*The following  statements creates a type of test_table Rowtype */

create type typ_l_tab AS (id integer, name character varying(100), login_date date);

/* Below statement creates a Domain for the above type typ_l_tab that matches the Oracle collection type */

DROP DOMAIN IF EXISTS dom_l_tab  ;

CREATE DOMAIN dom_l_tab  as typ_l_tab[];

/* In the declaration section of this code, we first created a type l_tab as an array of the table type of test_table.*/

DO $$
Declare
  /* Declare array variable using Domain */
  l_tab    dom_l_tab;
BEGIN

   /* Removed BULK COLLECT in PostgreSQL while migrating code from Oracle PLSQL. 
     BULK COLLECT SELCT INTO CLAUSE statement is modified with array_agg. */
 
  SELECT array_agg((id,name,login_date)::test_table) into l_tab FROM   test_table;
 
  RAISE NOTICE 'Bulk count: (%',  COALESCE(array_length(l_tab,1),0)|| ' rows): ' ;

END;
$$

The example produces the following output.

In PostgreSQL, to achieve similar functionality of the BULK COLLECT statement, we used the ARRAY_AGG function to aggregate the result of the query into an array. The result of the query is then stored in the l_tab variable, which is an array of test_table records.

BULK COLLECT with LIMIT

When using BULK COLLECT, you can also specify a LIMIT clause, which indicates the maximum number of rows to be fetched at once. This can be useful for managing memory usage and preventing the collection variable from becoming too large.

The following is an example of an Oracle function that uses BULK COLLECT with LIMIT:

SET SERVEROUTPUT ON
DECLARE 
CURSOR c1 IS SELECT ID FROM TEST_TABLE;
TYPE V_TAB IS TABLE OF INTEGER INDEX BY BINARY_INTEGER;
V_ID   V_TAB;
len INTEGER; 
BEGIN 
Open c1;
 LOOP
   /* Here we are using limit along with BULK COLLECT to limit the no of rows to 
     1000 for each loop */
   Fetch c1 bulk collect into V_ID limit 1000; 
   len:=V_ID.COUNT;
   DBMS_OUTPUT.PUT_LINE('len-'||len);
EXIT WHEN C1%NOTFOUND;
End loop;
 Close c1;
END;
/

The example produces the following output.

In PostgreSQL, we don’t have a LIMIT clause while processing multiple rows using an array, but as a solution we can achieve similar functionality by using LIMIT and OFFSET:

DO $$
 DECLARE
          V_ID INTEGER[];
          l INTEGER;
          o INTEGER; 
          len INTEGER;
	   v_limit INTEGER;
          v_offset INTEGER;
 
 BEGIN
        v_limit := 1000;
        v_offset := 0;
 LOOP 
SELECT array_agg(id) OVER (ROWS BETWEEN CURRENT ROW AND (v_limit - 1) FOLLOWING) INTO V_ID 
FROM test_table ORDER BY id LIMIT v_limit OFFSET v_offset;

       
	len := COALESCE(array_length(V_ID,1),0);
	   RAISE NOTICE 'len-%',len;
	   
       EXIT WHEN len = 0 ;       
       v_offset := v_offset + v_limit;
   END LOOP; 
 END;
 $$ 

The example produces the following output.

BULK COLLECT with FORALL

The FORALL statement allows you to perform a single DML operation on all elements of a collection, rather than processing each element one at a time. When used with BULK COLLECT, it can significantly reduce the amount of context switches between the SQL engine and the PL/SQL engine, which can speed up the run process.

The following example shows how to use BULK COLLECT with FORALL with Oracle:

CREATE TABLE TEST_TABLE2 AS SELECT * FROM TEST_TABLE WHERE 1=2;
 
SET SERVEROUTPUT ON 

DECLARE
   TYPE V_TEST IS TABLE OF TEST_TABLE%ROWTYPE;
   V_TAB  V_TEST;
   V_COUNT INTEGER;
BEGIN
  
 SELECT t.*
     BULK COLLECT INTO V_TAB
     FROM TEST_TABLE t;
     
 FOR i IN V_TAB.FIRST .. V_TAB.LAST 
    LOOP
   if MOD(V_TAB(I).ID, 2) = 0 THEN
/* Modifying the value of array elements */
    V_TAB(I).name    := 'EVEN';
 
END IF;
    END LOOP;
    
   DBMS_OUTPUT.put_line('Retrieved-'||TO_CHAR (V_TAB.COUNT)||' rows');
 
SELECT COUNT(1) INTO V_COUNT FROM TEST_TABLE2;
 
   DBMS_OUTPUT.put_line ('BEFORE TABLE COUNT-'||V_COUNT);
 
   FORALL i IN 1 .. V_TAB.COUNT
 
      INSERT INTO TEST_TABLE2
                  (
                  ID, 
                   NAME,
                   LOGIN_DATE
                   )
             VALUES 
                   (
                   V_TAB(i).ID, 
                   V_TAB(i).NAME,
                   V_TAB(i).LOGIN_DATE
                   );
   
  SELECT COUNT(1) INTO V_COUNT FROM TEST_TABLE2;
 
   DBMS_OUTPUT.put_line ('AFTER TABLE COUNT-'||V_COUNT);
END;
/

We get the following output.

In PostgreSQL, we can achieve a similar functionality of DML operations on data in bulk using UNNEST. This loads the complete data from the array variable into the table with a single SELECT statement.

We use the table test_table2 for BULK COLLECT with FORALL:

create table test_table2
 (
 id integer,
 name text,
 login_date timestamp without time zone
 );

We are trying to achieve the BULK COLLECT with FORALL functionality in PostgreSQL using the UNNEST option:

DO $$
declare
   V_TAB TEST_TABLE[]; -- t is the table name and at the same time a data type
   V_COUNT INTEGER;
BEGIN
 
   SELECT array_agg(t)
     into V_TAB
   FROM TEST_TABLE t;
 
   V_COUNT := cardinality(V_TAB);
  
RAISE NOTICE 'Retrieved % rows', V_COUNT;
   
SELECT COUNT(1) INTO V_COUNT FROM TEST_TABLE2;
 
RAISE NOTICE 'BEFORE TABLE COUNT- %', V_COUNT;

/* This one converts FORALL to PostgreSQL using arrays and unnest */

INSERT INTO TEST_TABLE2
                    (
                     ID, 
                     NAME,
                     LOGIN_DATE
						)
            SELECT r.*
          FROM UNNEST(V_TAB) as r;
		  
SELECT COUNT(1) INTO V_COUNT FROM TEST_TABLE2;
 
 RAISE NOTICE 'BEFORE TABLE COUNT- %', V_COUNT;
 
END $$;

The example produces the following output.

Debugging common errors

This section reviews common errors that you may encounter when using ARRAY_AGG as an alternative to BULK_LOAD and how you can resolve them.

We use the following sample source Oracle code to illustrate common errors we may face while refactoring the code to PostgreSQL.

The following is the Oracle code:

SET SERVEROUTPUT ON
DECLARE
TYPE t_bulk_collect_test_tab IS TABLE OF test_table%ROWTYPE;
l_tab t_bulk_collect_test_tab := t_bulk_collect_test_tab();
BEGIN

SELECT *
BULK COLLECT INTO l_tab
FROM test_table;

DBMS_OUTPUT.PUT_LINE('Bulk count: (' || l_tab.count || ' rows) ' );

for i in l_tab.first..l_tab.last loop

dbms_output.put_line('ID :'||l_tab(i).id||' ' ||'NAME :'|| l_tab(i).name||' '||'LOGIN_DATE :'||l_tab(i).logindate);

end loop;
END;
/
PL/SQL procedure successfully completed.

Bulk count: (100 rows) 
ID :1 NAME :test_1 LOGIN_DATE :29-02-20
ID :2 NAME :test_2 LOGIN_DATE :01-03-20
...

The following sections show PostgreSQL code.

ARRAY_AGG on a single column

This section focuses on common issues while using ARRAY_AGG on a single column of a table.

In our first example, we use a single column with the row operator in ARRAY_AGG:

create table test_table_demographic
(
	id integer,
	place character varying
);

 INSERT INTO test_table_demographic VALUES (1, 'Alaska'), (2, 'Arizona'), (3, 'California'), (4, 'Colorado'), (5, 'Chicago');


do $$ 
declare 
arr integer[];
var varchar;
v_place varchar;
begin 
select array_agg(row(a.id)) into arr from test_table a where id in (1,2,3,4,5);
RAISE NOTICE 'arr = %', arr;
loop
	select place into v_place from test_table_demographic where id=arr[i];
	RAISE NOTICE '% , % ',arr[i],v_place;
end loop;
end $$;

We get the following error when using a row operator on a single column along with ARRAY_AGG.

In the preceding output, the value fetched from the array is a row instead of an integer, which is why we get an error.

To resolve this issue, while working with ARRAY_AGG on a single column, we shouldn’t convert the integer into a row. The solution is to use only that column in the ARRAY_AGG expression for which we want to return the list of arrays.

The following is the correct approach of using ARRAY_AGG on a single column:

do $$ 
declare 
arr integer[];
var varchar;
v_place varchar;
begin 
select array_agg(a.id) into arr from test_table a where id in (1,2,3,4,5);
RAISE NOTICE 'arr = %', arr;
for i in 1 .. COALESCE(array_length(arr,1),0)
loop
	select place into v_place from test_table_demographic where id=arr[i]::integer;
	RAISE NOTICE '% , % ',arr[i],v_place;
end loop;
end $$;

We get the following output.

ARRAY_AGG on multiple columns

You may also run into issues when using ARRAY_AGG on multiple columns. For example, see the following code:

/* Lets create a table to discuss about array_agg in multi column scenario */

create table test2 as WITH t AS (
SELECT * 
FROM generate_series(1, 100) as n
)
SELECT n as c1,'test_'||n as c2 FROM t;

/* Lets create type for this scenario */

create type typ_test2 as (a integer, b text);

/* Example of common mistakes while using of array_agg in multi column scenario */ 

do $$ 
declare 
	arr test2[];
          var test2;
begin 
          select array_agg(a.c1, a.c2) f from public.test2 a into arr;
          RAISE NOTICE 'arr = %', arr;
          RAISE NOTICE 'arr[1] = %', arr[1];
          var := arr[1];
          RAISE NOTICE 'var = %', var;
          RAISE NOTICE 'var.c1 = %, var.c2 = %', var.c1, var.c2;
end $$;

The example produces the following output.

In this example, we used individual column names with the ARRAY_AGG function. As a result, we get an error because the function ARRAY_AGG(integer, text) doesn’t exist. To resolve this error, let’s discuss two approaches while using multiple columns in ARRAY_AGG functions.

In our first approach, we use a row operator on top of ARRAY_AGG, which first converts the columns into rows. Then the ARRAY_AGG() function loads the same into an arr variable, which is an array of the test2 table type. See the following code:

do $$ 
declare 
        arr test2[];
        var test2;
begin 
        select array_agg(row(a.c1, a.c2)) f from public.test2 a into arr;
        RAISE NOTICE 'arr = %', arr;
        RAISE NOTICE 'arr[1] = %', arr[1];
        var := arr[1];
        RAISE NOTICE 'var = %', var;
        RAISE NOTICE 'var.c1 = %, var.c2 = %', var.c1, var.c2;
end $$;

The example produces the following output.

In our second approach, let’s say we have n number of columns for which we have to do a multi-column insert. In this scenario, in spite of specifying all columns, we can use the table alias with the * operator:

do $$ 
declare 
	arr test2[];
          var test2;
begin 
	/* Row operator with table alias with ‘*’ operator to specify all column inspite 
          of individually mentioning them in array_agg() function */

	select array_agg(row(a.*)) f from public.test2 a into arr;
          RAISE NOTICE 'arr[1] = %', arr[1];
          var := arr[1];
          RAISE NOTICE 'var = %', var;
          RAISE NOTICE 'var.c1 = %, var.c2 = %', var.c1, var.c2;
end $$

We get the following output.

Conclusion

In this post, we shared step-by-step instructions for migrating Oracle bulk binds to Amazon Aurora PostgreSQL-Compatible Edition or Amazon RDS for PostgreSQL, and provided code templates, SQL scripts, and best practices. We also discussed common mistakes and solutions while using ARRAY_AGG as an alternative to BULK COLLECT.

For more information about arrays and types in PostgreSQL, refer to Arrays and Aggregate Functions.

If you have any questions or suggestions about this post, leave a comment.


About the Authors

Vinay Paladi is a Lead Database Consultant with the Professional Services team at Amazon Web Services. He works as a database migration specialist, helping customers build highly available, cost-effective database solutions and migrate their commercial engines to the AWS Cloud. He is passionate about building innovative solutions to accelerate the database journey to the cloud.

Bikash Chandra Rout is a Lead Database Consultant with the Professional Services team at Amazon Web Services. Bikash focuses on helping customers build highly available, cost-effective database solutions and migrate their commercial engines to the AWS Cloud. He is passionate about databases and analytics.