AWS Database Blog

How to migrate BLOB and CLOB tables from Oracle to PostgreSQL using Ora2pg and AWS DMS

Many enterprises are considering migrating their Oracle databases to PostgreSQL because of the close compatibility between the platforms and the enterprises’ desire to reduce licensing costs. Amazon RDS for PostgreSQL and Amazon Aurora with PostgreSQL compatibility make it easy to set up, operate, and scale PostgreSQL deployments in the cloud in a cost-efficient manner.

AWS offers two services to migrate Oracle databases running in either the AWS environment or your on-premises data centers. AWS Schema Conversion Tool (AWS SCT) helps you convert your existing database schema to the target database schema. If AWS SCT can’t convert a schema automatically, it gives an assessment report telling how you can create an equivalent schema in the target database. AWS Database Migration Service (AWS DMS) helps you migrate the actual data from source database to target database.

At times, you might want to migrate large BLOBs and CLOBs outside of AWS DMS. If you are already familiar with the open-source tool Ora2Pg, you can use this tool to bulk load your data and then use AWS DMS for change data capture (CDC).

In this blog post, we discuss how to migrate your Amazon RDS for Oracle database to Amazon RDS for PostgreSQL. We do this by using the open source Ora2Pg tool for the initial load of tables with CLOB column data and using the CDC feature of AWS DMS for ongoing replication. Using Ora2Pg and AWS DMS together accomplishes the migration and sets the stage for ongoing replication.

To test the migration of data from Amazon RDS for Oracle to Amazon RDS for PostgreSQL, we provide an Amazon RDS for Oracle snapshot for you to use. This snapshot contains one schema and user, both with the name awsorauser, and five tables. Two tables in this schema (CUSTOMER and PART) each have a CLOB column. Each CLOB column has XML content, and the size of the CLOB column in each row is approximately 5 MB. The two tables have 5,000 rows each. In addition, the SUPPLIER table has 110,000 rows but doesn’t have any CLOB columns. The other two tables, NATION and REGION, are very small.

Migration overview

At a high level, we do the following in this post:

  1. Create a virtual private cloud (VPC) based on Amazon VPC with two subnets and a security group.
  2. Provision an Amazon RDS for Oracle database from a publicly available snapshot.
  3. Provision and configure an Amazon EC2 instance to load the test data into an Amazon RDS for Oracle database.
  4. Provision an Amazon RDS for PostgreSQL database.
  5. Enable archive log mode, supplemental logging, and automatic backups on the source Oracle database.
  6. Provision an Amazon EC2 instance for installing the Ora2Pg tool.
  7. Install and configure the Ora2Pg tool.
  8. Load new data into the source Oracle database to test change data capture (CDC).
  9. Create AWS DMS tasks with CDC to enable replication from the source RDS for Oracle database to the target RDS for PostgreSQL database.

Prerequisites and assumptions

To perform this exercise, you need the following:

  1. An Oracle account to download the Oracle client packages.
  2. An AWS account that provides access to AWS services.
  3. An AWS Identity and Access Management (IAM) user with an access key and secret access key to configure the AWS CLI.
  4. For testing purposes, the services that this post works with need to be created in the us-east-1a Availability Zone in the us-east-1 Region.

Additionally, be aware of the following:

  1. We configure all services in the same VPC to simplify networking considerations.
  2. The predefined database schema name and the password in the source RDS for Oracle database are both awsorauser.
  3. The predefined root user name for the source RDS for Oracle database is admin.
  4. The predefined root password for the source RDS for Oracle database is admin123.
  5. While creating the RDS for PostgreSQL database, use lowercase letters for the database name, master user name, and schema names.
  6. Important: The AWS CloudFormation templates that we provide use hard-coded user names and passwords and open security groups. These are just for testing purposes and aren’t intended for production use without any modifications.

AWS components that are used in the migration

You need the following AWS components for successful testing of this migration.

Component Purpose
1 Amazon RDS for Oracle instance The source Oracle DB instance to restore from a provided snapshot.
2 Amazon RDS for PostgreSQL instance The target PostgreSQL instance. Our source Oracle database is migrated to this RDS for PostgreSQL instance.
3 First Amazon EC2 machine (m4.xlarge), tag name Ora2pg-blog-EC2InstanceFor-SourceDataLoading This EC2 machine is used to add new data into the source Oracle database while the initial migration is running from the source RDS for Oracle database to the target PostgreSQL database. This machine uses Oracle client version 12.2 because the Oracle SQL Loader tool sqlldr comes with this version and is needed for adding new data into source RDS for Oracle database.
4 Second Amazon EC2 machine (m4.4xlarge), tag name Ora2pg-blog-EC2InstanceFor-Ora2pg-Configuration This EC2 machine is used to move the data from RDS for Oracle database to RDS for PostgreSQL database using Ora2pg. We install and configure Ora2pg on this EC2 instance. We also need to install 11.2 version of Oracle client on this machine instead of 12.2 version. Thus, we need two separate EC2 instances.
5 AWS DMS instance This DMS instance is needed to enable ongoing replication from the source RDS for Oracle database to the target RDS for PostgreSQL database.

Implementing the migration

In this section, we describe how to implement the migration.

1. Use the AWS CloudFormation template for initial setup

In this step, we set up a VPC, two subnets, and a security group. Restore the source RDS for Oracle database from the snapshot. Create the target RDS for PostgreSQL database. Create two EC2 instances, one to use for loading the test data and one to use for configuring the Ora2pg tool and software.
You can use this downloadable CloudFormation template to set up the VPC security group, source RDS for Oracle database, target RDS for PostgreSQL database, and two EC2 instances. To launch directly through the console, click on the below Launch Stack button:

 

The CloudFormation template takes the following parameters. Some of the parameters are predefined, and you can’t edit these. For some of the parameters, you need to provide the values. The table gives details.

For this parameter Use this Predefined or manual entry?
Stack name Provide a unique stack name, for example ora2pg-dms-stack-1 Manual entry
OracleDatabaseName AWSORA Predefined
OracleInstanceType db.m4.xlarge Predefined
DBSnapshotIdentifier aws-blogs-oracle2postgres-migration-using-ora2pg-dms-source-data Predefined
OracleDBAllocatedStorage 500 GB Predefined
OracleDBInstanceIdentifier awsorains Predefined
PostgreSQLDBInstanceIdentifier awspgins Predefined
PostgreSQLDBInstanceClass db.m4.xlarge Predefined
PostgreSQLDBName awspgdb Predefined
PostgreSQLDBUserName Root Predefined
PostgreSQLDBPassword Rootroot123 Predefined
PostgreSQLDBAllocatedStorage 500 GB Predefined
KeyName Provide the name of the existing EC2 key pair Manual entry
ClientIP Provide the IP address range to use to connect to the RDS and EC2 instances from your local machine. This must be a valid IP CIDR range of the form x.x.x.x/x. Get your address using checkip.amazonaws.com or whatsmyip.org. Manual entry
DataLoadingIntoOracleEC2InstanceType m4.xlarge Predefined
Ora2PGEC2InstanceType m4.4xlarge Predefined

After you specify the template details, choose Next. On the options page, choose Next again. On the Review page, choose Create.

The output from the preceding template is as follows.

Key Value
StackName Name
VPCID Vpc-xxxxxxx
Regionname AWS Region—make sure that it shows us-east-1
SourceRDSOracleEndPoint Source RDS for Oracle endpoint
TargetRDSPostgreSQLEndPoint Target RDS for PostgreSQL endpoint
EC2InstanceForOra2PgInstanceId Instance ID of the second EC2 instance, where we install and configure the Ora2pg tool
EC2InstanceForOra2PgAvailabilityZone Availability Zone where this EC2 instance is created
EC2InstanceForOra2PgPublicDNS Public DNS address of the second EC2 instance, where we install and configure the Ora2pg tool
EC2InstanceForSourceDataLoadingInstanceId Instance ID of the first EC2 instance. We use this instance to load the test data into the source RDS for Oracle database, to simulate the data loading while the initial load is running with Ora2pg.
EC2InstanceForSourceDataLoadingPublicDNS Public DNS address of the first EC2 instance
EC2InstanceForSourceDataLoadingAZ us-east-1a—Availability Zone where this EC2 instance must be created. Select the corresponding subnet associated with your VPC.
SubnetID1A ID of subnet created in Availability Zone 1A
SubnetID1B ID of subnet created in Availability Zone 1B

It takes approximately 15–16 minutes for the deployment to complete. When the stack launch is complete, it returns the output with information about the resources that were created. Make a note of the output, because you use this information in the next step. You can view the stack outputs on the AWS Management Console or by using the following AWS CLI command:

$ aws cloudformation describe-stacks --stack-name <stack_name> --region us-east-1 --query 'Stacks[0].Outputs'

2. Configure the first EC2 instance

Next, configure the first EC2 instance for loading test data into Amazon RDS for Oracle database. This EC2 instance has the tag name Ora2pg-blog-EC2InstanceFor-SourceDataLoading. For this configuration, do the following:

  1. Download the Oracle instant client .rpm files from the Instant Client Downloads for Linux x86-64 (64-bit) page on the Oracle website to the first EC2 instance that was created using the CloudFormation template. The download process requires you to log in to your own OTN network and accept the license agreement. The files to download are oracle-instantclient12.2-tools-12.2.0.1.0-1.x86_64.rpm, oracle-instantclient12.2-basic-12.2.0.1.0-1.x86_64.rpm, and oracle-instantclient12.2-sqlplus-12.2.0.1.0-1.x86_64.rpm. Make sure that you are downloading the 12.2 version of Oracle client .rpm files in this first EC2 instance, because the 12.2 version client comes with the Oracle SQL Loader tool, sqlldr.
  2. Copy the .rpm files into the /tmp directory of the first Amazon EC2 instance (the instance with the tag Name Ora2pg-blog-EC2InstanceFor-SourceDataLoading). This instance is for loading the test data. You can get the EC2 instance’s public DNS value by referring to the CloudFormation template output table’s key EC2InstanceForSourceDataLoadingPublicDNS. In some cases, you might have trouble connecting to the cluster using Secure Shell (SSH). If so, check the cluster’s security group to make sure that it allows inbound SSH connection (through TCP port 22) from your client’s IP address.
  3. Follow the instructions listed in the AWS CLI Getting Started topic to configure the AWS CLI using the aws configure The user for this configuration needs to be set as ec2-user.
    ec2-user@ip-172-31-2-77 tmp]$ aws configure
    AWS Access Key ID [None]: XXXXXXXXXXXXXXXXXXXXX
    AWS Secret Access Key [None]: xxxxxxxxxxxxxxxxxxxxxxxxx
    Default region name [None]: us-east-1
    Default output format [None]:
  4. Download the wrapper script from the Amazon S3 bucket setting_oracle_client_host.sh. As the user ec2-user, run the following command to download the scripts to configure.
    cd; aws s3 cp s3://aws-bigdata-blog/artifacts/awsora2pgblogfiles/data-loading/scripts/setting_oracle_client_host.sh . 
  5. Execute the script using the following command.
    cd; sh -x ./setting_oracle_client_host.sh <RDS_ORACLE_END_POINT> <EC2_INSTANCE_AVAILABILTY_ZONE> <DB_NAME>

    Provide the Amazon RDS for Oracle endpoint that was created preceding, without the port number and colon. Also provide the Availability Zone as us-east-1a and the database name that was selected while restoring the RDS for Oracle snapshot. Following is an example:

    cd; sh -x ./setting_oracle_client_host.sh awsorains.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com us-east-1a AWSORA

    When you execute the wrapper script, it also downloads the additional scripts that are required to create tables, sqlldr control files, and so on. In addition, it attaches an Amazon EBS volume and is mounted on /local/mnt6. The EBS volume contains the necessary CLOB files that are used for loading some rows into the source RDS for Oracle database while the Ora2pg job is running. This script also enables archive log mode and supplemental logging on the source RDS for Oracle database. Note: When the script execution is complete, it mounts a disk with a volume 25 GB in size. This disk contains the necessary files to generate the dataset for loading the test data. You can check this by running the command df –h. When you clean up at the end of the exercise, you manually delete this EBS volume.

3. Configure the second EC2 instance

Next, install and configure the Ora2pg tool on the second Amazon EC2 instance. This EC2 instance has the tag name Ora2pg-blog-EC2InstanceFor-Ora2pg-Configuration.

To install Ora2pg, we need to configure the Oracle instant client, Comprehensive Perl Archive Network (CPAN), Perl-DBI module, Perl-YAML module, and the PostgreSQL client (for testing the psql connection). To set these up, we provide scripts. These scripts also create a schema, a user, and a role in the PostgreSQL database.

Download the Oracle instant client .rpm files from rhw Instant Client Downloads for Linux x86-64 (64-bit) page. The download process requires you to log in to your own OTN network. The files to download are oracle-instantclient11.2-basic-11.2.0.4.0-1.x86_64.rpm, oracle-instantclient11.2-sqlplus-11.2.0.4.0-1.x86_64.rpm, and oracle-instantclient11.2-devel-11.2.0.4.0-1.x86_64.rpm. Make sure that you are downloading the 11.2.0.4 version .rpm files.

For this configuration, do the following:

  1. Copy the Oracle client .rpm files to the /tmp directory of the second Amazon EC2 instance that was created using the CloudFormation template. You can get the second EC2 instance’s public DNS value by referring to the preceding template output table’s key EC2InstanceForOra2PgPublicDNS.
  2. Follow the instructions listed in AWS CLI Getting Started to configure the AWS CLI using the aws configure The user for this configuration needs to be set as ec2-user.
    [ec2-user@ip-172-31-2-77 tmp]$ aws configure
    AWS Access Key ID [None]: XXXXXXXXXXXXXXXXXXXXX
    AWS Secret Access Key [None]: xxxxxxxxxxxxxxxxxxxxxxxxx
    Default region name [None]: us-east-1
    Default output format [None]:
  3. Download the Ora2pg configuration scripts from Amazon S3. The script names are setting_ora2pg_host_step1.sh and setting_ora2pg_host_step2.sh. The setting_ora2pg_host_step1.sh. The setting_ora2pg_host_step1.sh script installs Oracle 11.2 client .rpm files, sets up Oracle environment variables, and installs the PERL-CPAN repository, Perl Database Interface (PERL-DBI), Perl-Yaml module, PostgreSQL client, and Perl DBD::Pg module. These are required for successful configuration of Ora2pg software. The setting_ora2pg_host_step2.sh script downloads and installs Ora2pg version 18.1 from GitHub. It then checks the database connections to source and target, creates Ora2pg configuration files for each table, and generates the target schema structure. It creates a user and a role, and it creates a schema structure in the target RDS for PostgreSQL database. It then gets the latest Oracle system change number (SCN) and converts it into Unix time in milliseconds. Unix time is used for the CDC start time value when you create DMS tasks. Finally, the script initiates the copy of the source RDS for Oracle table’s data into the target RDS for PostgreSQL database.
    As the ec2-user user, run the following commands to download two scripts to configure the Ora2pg tool.

    cd; aws s3 cp s3://aws-bigdata-blog/artifacts/awsora2pgblogfiles/data-loading/scripts/setting_ora2pg_host_step1.sh . 
    cd; aws s3 cp s3://aws-bigdata-blog/artifacts/awsora2pgblogfiles/data-loading/scripts/setting_ora2pg_host_step2.sh .
  4. Execute the first script using the following command, passing these parameters to the script:
    • The RDS for Oracle database endpoint that was created preceding. Get this value from the output section of the first CloudFormation template.
    • The Amazon RDS for Oracle DB name that was selected as part of the RDS for Oracle database restore.
    cd; 
    chmod 755 setting_ora2pg_host_step1.sh
    ./setting_ora2pg_host_step1.sh <RDS_ORACLE_END_POINT> <ORA_DB_NAME> 

    Following is an example:

    ./setting_ora2pg_host_step1.sh awsorains.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com AWSORA

    Modify awsorains.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com to be the correct RDS for Oracle endpoint.

  5. Get the CPAN Test::More, DBD::Pg, and DBD::Oracle modules and install and compile them as described following. The DBD::Oracle and DBD::Pg modules are Perl modules that work with the database interface (DBI) to provide access to the source RDS for Oracle instance and target RDS for PostgreSQL instance. You need Test::More for successful installation of Oracle and Pg modules. To install and compile these modules, run the following commands:
    1. Run this command at the Unix shell prompt: cpan. The first time that you run the cpan command, you’re prompted to answer a few questions. Just accept the default values by pressing the Enter key on your keyboard. When cpan is set up, it takes you to the cpan shell. At the cpan shell, run this command:
      look Test::More

      This command installs the Test::More module, needed for successful installation of the DBD::Pg and DBD::Oracle After this command runs successfully, it exits from the cpan shell and takes you to the Unix shell prompt. At that prompt, run the following commands to build the module:

      perl Makefile.PL
      sudo make
      sudo make install
      

      When these commands are complete, exit from the shell by running the exit command as many times as needed, until you fully exit the cpan shell.

    2. Install the DBD::Pg module using the following commands at the cpan prompt:
      1. Log in to the cpan shell using the cpan command.
      2. Run this command at the cpan prompt: look DBD::Pg After you run this command, you return to the Unix shell prompt.
      3. At that prompt, run this command: perl Makefile.PL
      4. Then run this command: sudo make After running this command, you might see warning messages like this:
        “/usr/lib64/perl5/vendor_perl/auto/DBI/DBIXS.h:491:25: warning: ‘dbi_get_state’ defined but not used [-Wunused-function]” 

        Ignore these messages.

      5. Run this command: sudo make install
      6. When this command is complete, exit the shell by running the exit command.
    3. Install the DBD::Oracle module using the following commands at the cpan prompt:
      1. Log in to the cpan shell using the cpan command
      2. Run this command at the cpan prompt: look DBD::Oracle After you run this command, you return to the Unix shell prompt.
      3. At that prompt, run this command: perl Makefile.PL
      4. Then run this command: sudo make After running this command, you might see some warning messages. Ignore these messages.
      5. Run this command: sudo make install
      6. When this command is complete, exit the shell by running the exit command as many times as needed, until you fully exit the cpan shell.
    4. Execute the second script as described following. Make sure that you have completed the first script before proceeding to the second script. Start by running the following command.
      cd; 
      chmod 755 setting_ora2pg_host_step2.sh
      ./setting_ora2pg_host_step2.sh <RDS_ORACLE_END_POINT> <ORA_DB_NAME> <PG_RDS_END_POINT> <PG_MASTER_USERNAME> <PG_MASTER_PASSWORD> <PG_DATABASE_NAME> <PG_MIGRATION_USER> <PG_MIGRATION_PASSWORD> <PG_MIGRATION_SCHEMA> <PG_CONTROL_SCHEMA> 

      The parameters to pass to this script are described in the following table.

        Script parameter Value Comments
      1 <RDS_ORACLE_END_POINT> awsorains.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com Endpoint for the source RDS for Oracle database. Change this value to your appropriate endpoint.
      2 <ORA_DB_NAME> AWSORA Database name in the source RDS for Oracle database. This value was provided as part of the database restore.
      3 <PG_RDS_END_POINT> awspgins.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com Target RDS for PostgreSQL endpoint. Change this value to your appropriate endpoint.
      4 <PG_MASTER_USERNAME> root Admin user name that was selected as part of the RDS for PostgreSQL database creation.
      5 <PG_MASTER_PASSWORD> Rootroot123 Admin user password that was selected as part of the RDS for PostgreSQL database creation.
      6 <PG_DATABASE_NAME> awspgdb Database name that was selected as part of the RDS for PostgreSQL database creation.
      7 <PG_MIGRATION_USER> miguser User name that is created in the PostgreSQL database and is used for data migration.
      8 <PG_MIGRATION_PASSWORD> migpasswd Password for migration user.
      9 <PG_MIGRATION_SCHEMA> awspgschema Name of the schema that is created in PostgreSQL database. All tables are migrated to this schema. Make sure that you use lowercase letters for the schema name.
      10 <PG_CONTROL_SCHEMA> awspgctrlschema Name of the control schema that is created in PostgreSQL database. This schema name is used by the control tables that are created as part of the AWS DMS setup. Make sure that you use lowercase letters for the control schema name.

      Following is an example:

      ./setting_ora2pg_host_step2.sh awsorains.xxxxxx.us-east-1.rds.amazonaws.com AWSORA awspgins.xxxxxx.us-east-1.rds.amazonaws.com root Rootroot123 awspgdb miguser migpasswd awspgschema awspgctrlschema

      When you execute the script, at the end it executes the Ora2pg jobs in nohup mode (that is, they execute in the background). To exit the program, just press the Enter key on your keyboard.

      Note: The second script creates a schema and required tables in the target RDS for PostgreSQL database. It also starts the data copy from the source RDS for Oracle database to the target RDS for PostgreSQL database. It gets the latest SCN number and the time stamp from the RDS for Oracle database before copying the data using Ora2pg. This information is needed when starting the DMS tasks. This information is saved in a file with the name scn_information_from_source.out in the home directory of the ec2-user user on this EC2 instance—that is, the instance with the tag name Ora2pg-blog-EC2InstanceFor-Ora2pg-Configuration.

      Sample output from this logfile is as follows:

      -bash-4.2$ cat scn_information_from_source.out
      ||******************************* Current SCN Number from RDS Oracle is : 666311 ***********||
      ||************Timestamp value after subtracting 15 minutes is : 2018-04-18 16:58:43 *******||
      ||******************************************************************************************||
      ||******************************************************************************************||
      ||********* TIME STAMP VALUE IN MILLISECONDS TO USE FOR DMS TASKS IS  : 1524070723000 ******||
      ||******************************************************************************************||
      CUSTOMER TABLE COPY PID IS : 6933
      PART TABLE COPY PID IS : 6934
      SUPPLIER TABLE COPY PID IS : 6935
      REGION AND NATION TABLE COPY PID IS : 6935
      -bash-4.2$

      This script executes four separate Ora2pg jobs or tasks for the CUSTOMER, PART, and SUPPLIER tables and a single job for REGION and NATION tables. These jobs run in nohup mode (that is, they execute in the background).

      The CUSTOMER and PART tables each have a CLOB data column. While migrating the data into the target RDS for PostgreSQL database, we simulate loading more transactions to the SUPPLIER and CUSTOMER tables and running separate jobs for these tables. Because that PART table has a CLOB column, we also run a separate job for this table. NATION and REGION tables don’t have any new inserts coming in, so we group them into a single Ora2pg job.

      For these separate jobs, we generate separate log files. The log file names are ora2pg_aws_rds_oracle_to_pg_customer_blog.out, ora2pg_aws_rds_oracle_to_pg_part_blog.out, ora2pg_aws_rds_oracle_to_pg_region_nation_blog.out, and ora2pg_aws_rds_oracle_to_pg_supplier_blog.out. These are stored in the home directory.

      In our testing, executing this script took around 1 hour and 20 mins. While executing the script, you can check these log files to see if the copy is still running. When the script is completed, you see a message like this: 5080.91user 55.69system 1:18:57elapsed. The CUSTOMER and PART tables’ jobs take a long time to finish because they have CLOB column data to copy from the source RDS for Oracle database to the target PostgreSQL database.

4. Load new data into the source Oracle database

When the copy from the source RDS for Oracle database to the target RDS for PostgreSQL database has started, simulate new data load into the source RDS for Oracle database:

  1. Log in to the first Amazon EC2 instance, with the tag name Ora2pg-blog-EC2InstanceFor-SourceDataLoading, for loading the test data. You can get the EC2 instance’s public DNS value by referring to the output table’s key from the CloudFormation template, that is EC2InstanceForSourceDataLoadingPublicDNS.
  2. Run the following command to load new data into the CUSTOMER table. This command loads new 100 rows into the CUSTOMER table in the source Oracle database. Make sure that you run this command in the EC2InstanceForSourceDataLoadingPublicDNS instance.
    cd; date; nohup time sh -x /home/ec2-user/data-loading/scripts/load_tables.sh customer /home/ec2-user/data-loading/input-files/customer/customer.tbl.1.100rows awsorauser awsorauser AWSORA > load_tables_customer_tbl.100rows.log &

    Here, AWSORA is the name of the database that was created in the source RDS for Oracle database.

  3. Run the following command to load new data into the SUPPLIER table. This command loads AN additional 1,000,000 rows into supplier table. Make sure that you run this command in the first EC2 instance.
    cd; date; nohup time sh -x /home/ec2-user/data-loading/scripts/load_tables_supplier.sh supplier /home/ec2-user/data-loading/input-files/supplier/only_supplier_files_tbl.3.lst awsorauser awsorauser AWSORA > only_supplier_files_tbl.2.log &

5. Create the dms-vpc-role IAM role if this doesn’t exist and add an IAM policy

AWS accounts that have never used AWS DMS before get a failure when you create the AWS::DMS::ReplicationSubnetGroup resource using the preceding CloudFormation template. This failure occurs because the dms-vpc-role doesn’t exist. Also, creation of the CloudFormation stack in the next step fails if the dms-vpc-role doesn’t exist or isn’t configured properly.

You can check whether this IAM role exists and create it if it doesn’t exist by using the following AWS CLI commands:

  1. Check if the dms-vpc-role IAM role exists using the following CLI command.
    aws iam get-role --role-name dms-vpc-role
  2. Create dms-vpc-role if it doesn’t exist using the following CLI command.
    aws iam create-role --role-name dms-vpc-role --assume-role-policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\"Service\":\"dms.amazonaws.com\"},\"Action\":\"sts:AssumeRole\"}]}"

Next, to accomplish the following steps, attach the AWS managed policy AmazonDMSVPCManagementRole to the dms-vpc-role IAM role using the following CLI command.

aws iam attach-role-policy --role-name dms-vpc-role --policy-arn arn:aws:iam::aws:policy/service-role/AmazonDMSVPCManagementRole

6. Create the AWS DMS source and target endpoints and DMS instance

Using this CloudFormation template, set up the source DMS source endpoint, target endpoint, and DMS replication instance. To launch directly the template through the console, click on the below Launch Stack button:

 

This template takes the following parameters. Some of the parameters are predefined, and you can’t edit these. For some of the parameters, you need to provide the values. The table gives details.

For this parameter Use this Predefined or manual entry?
Stack name Provide a unique stack name, for example ora2pg-dms-stack-1 Manual entry
DMSInstanceType dms.c4.4xlarge Manual entry
SourceRDSOracleEndpointID Provide the RDS for Oracle source endpoint Manual entry
SourceRDSOracleDatabaseName AWSORA Predefined
SourceRDSOracleUserName awsorauser Predefined
SourceRDSOracleDBPassword awsorauser Predefined
SourceRDSOracleDBPort 1521 Predefined
TargetRDSPostgreSQLEndpointID Provide the RDS for PostgreSQL endpoint Manual Entry
TargetRDSPostgreSQLDatabaseName awspgdb Predefined
TargetRDSPostgreSQLUserName root Predefined
TargetRDSPostgreSQLDBPassword Rootroot123 Predefined
TargetPostgreSQLDBPort 5432 Predefined
VPCID Select the VPC that was created in the CloudFormation template mentioned preceding Manual entry
VPCSecurityGroupId Provide the ID of the security group that was created using the first CloudFormation template. You can get the value from the Output section of the first CloudFormation template. Manual entry
DMSSubnet1 Provide first subnet ID corresponding to your VPC. This subnet was created as part of the first CloudFormation template. Get the value from the Output section of the first CloudFormation template. Manual entry
DMSSubnet2 Provide second subnet ID corresponding to your VPC. This subnet was created as part of the first CloudFormation template. Get the value from the Output section of the first CloudFormation template. Manual entry
DMSCDCTaskStartTime Provide the CDC start time as 15 minutes before the time stamp value from the scn_information_from_source.out log file generated from the Ora2pg tool full copy. This time must be in this format: YYYY-MM-DD HH:MI:SS Manual entry

After you specify the template details, choose Next. On the options page, choose Next again. On the Review page, select the I acknowledge that AWS CloudFormation might create IAM resources with custom names check box, and then choose Create.

When the stack launch is complete, it returns output with information about the resources that were created. Make a note of the output, because you use this information in the next step. You can view the stack outputs on the AWS Management Console or by using the following AWS CLI command:

$ aws cloudformation describe-stacks --stack-name <stack_name> --region us-east-1 --query 'Stacks[0].Outputs'

The output from the preceding template is as follows.

Key Value
DMSRepulicationInstance Name
DMSPostgreSQLEndpoint us-east1
DMSOracleEndpoint Output of the Oracle endpoint

7. Check that the scripts are completed

Before proceeding with replication, first make sure that the setting_ora2pg_host_step2.sh script is completed.

You can check this by checking the log files ora2pg_aws_rds_oracle_to_pg_part_blog.out and ora2pg_aws_rds_oracle_to_pg_customer_blog.out. These files are in the EC2 instance with the identifier EC2InstanceForOra2PgPublicDNS that was created using the first CloudFormation template. If the script is completed, at the end of the log files you should see a message like 5080.91user 55.69system 1:18:57 elapsed.

You can also verify this by checking the number of rows the script has copied into target RDS for PostgreSQL database. To check the number of rows, run the following commands to connect to the PostgreSQL database from EC2InstanceForOra2PgPublicDNS instance:

  1. Connect to the PostgreSQL database with the following command.
    psql --host=<RDS_Postgresql_EndPoint> --port=5432 --username=root --password --dbname=awspgdb

    In this command, change the value for <RDS_Postgresql_EndPoint> to the correct RDS for PostgreSQL endpoint. You’re prompted for the password for the root user, and it is Rootroot123.

  2. When you’re logged in, change the schema path using the following command.
    SET search_path = awspgschema;
  3. Run the following select commands to check the number of rows in the PostgreSQL database.
    select count(*) from customer;
    select count(*) from part;

    The CUSTOMER and PART tables should show 5000, because we migrated 5000 rows from the CUSTOMER and PART tables in the source RDS for Oracle database to the target RDS for PostgreSQL database.

  4. Use the \q command to exit from the psql shell.

8. Create DMS tasks with CDC to enable replication

Now that we have completed the initial data migration from RDS for Oracle to RDS for PostgreSQL, we are ready to replicate the remaining data and can enable ongoing replication. To do so, we use AWS DMS with the CDC option. Because we divided our migration process into four tasks, we create four AWS DMS tasks to replicate the changes made on the source RDS for Oracle database to the target RDS for PostgreSQL database.

Before creating tasks, make sure that you test the connections from the DMS replication instance to the source and target RDS databases using the endpoints. To do so, first go to the Endpoints section in the DMS console, select your source endpoint, and choose Test Connection. Next, select the correct replication instance from the list, and choose Run test. Then, go to the Endpoints section again, select the target endpoint, and choose Test Connection. Follow the same steps to make sure both connections are successful.

Then, using this CloudFormation template, set up the source DMS tasks. To launch directly through the console, click on the below Launch Stack button:

 

This template takes the following parameters. Some of the parameters are predefined, and you can’t edit these. For some of the parameters, you need to provide the values. The table gives details.

For this parameter Use this Predefined or manual entry?
DMSCDCTaskStartTimeInMilliSeconds Provide the timestamp value from the output of the scn_information_from_source.out log file. Manual entry
DMSRepulicationInstanceARN Provide the Amazon Resource Name (ARN) for the replication instance. You can get this value from the output of the previous template. Manual entry
DMSSourceOracleEndpointARN Provide the endpoint ARN for the DMS source Oracle instance. You can get this value from the output of the previous template. Manual entry
DMSTargetSPostgreSQLEndpointARN Provide the endpoint ARN for the PostgreSQL endpoint. You can get this value from the output of the previous template. Manual entry

After you specify the template details, choose Next. On the options page, choose Next again. On the Review page, choose Create.

The preceding CloudFormation template creates the tasks and moves to the Ready state. When the stack launch is complete, manually start the DMS tasks.

To do this, sign in to the DMS console and navigate to the Tasks page. Select one task at a time and choose Start/Resume for each. Doing so initiates change data capture starting at the specified time from the source RDS for Oracle database to the target RDS for PostgreSQL database.

In a real-world scenario, once the DMS tasks have replicated all the data changes to the RDS for PostgreSQL database, you stop all connections to the source Oracle database. You then wait to catch up all the remaining transactions, if any, to the target database. When they have, you can stop the DMS task in the DMS console.

To finish the process, recreate triggers, indexes, and foreign key constraints on the target database. Drop existing sequences in the PostgreSQL database, and migrate the sequences using Ora2pg separately with CACHE value 1. Make sure that data validations are performed, and cut over your applications to point to PostgreSQL and restart them.

Cleaning up after the migration

After completing and testing this solution, clean up the resources by stopping your tasks and deleting the CloudFormation stacks:

  1. Go to the DMS console, navigate to the Tasks page, and stop the tasks manually one after another.
  2. Go to the CloudFormation web UI and delete the stacks in reverse order of their creation.
  3. Delete the Amazon EBS volume that was created as part of this blog post:
    1. Go to EC2 web console, and choose Volumes in the Elastic Block Store
    2. Select the EBS volume with the name oracle-client-for-rds-oracle-restored-volume1
    3. For Actions, choose Delete Volume.
    4. Also delete any RDS snapshots that you created for these databases.

Conclusion

In this post, we give you an overview of migrating an Amazon RDS for Oracle database to Amazon RDS for PostgreSQL using the open-source tool Ora2pg and AWS DMS. You can also use the same procedure to migrate your on-premises Oracle database to an Amazon RDS for PostgreSQL database or Amazon Aurora PostgreSQL database. To accomplish this, you can migrate the initial data load using various methods and then migrate the remaining data using the custom CDC start-time option available in AWS DMS, as we have shown. You enable the ongoing replication until the cutover is complete.


About the Author

Photo of Srikanth KodaliSrikanth Kodali is a big data consultant at Amazon Web Services. He works with AWS customers to provide guidance and technical assistance about database projects, helping them improve the value of their solutions when using AWS.