AWS Database Blog

Split data across Regions as it’s migrated from on-premises Oracle to Amazon Aurora PostgreSQL while maintaining data integrity using AWS DMS – Part 2

As enterprises migrate large applications and large on-premises backend databases to AWS, you likely want to place the databases closer to your customers for low latency and better application performance. This isn’t always easy because your customers may be located all over and on-premises data centers are usually at a handful of locations. This means that data in the on-premises databases need to be split across various AWS Regions while maintaining data integrity, which is critical for smooth running of the application and the end-user experience.

The way we can split data is based on the amount of data that we need to process, the downtime, and the cutover window duration. In this series we discuss two approaches that you can use to split data from one Oracle source database to two Amazon Aurora PostgreSQL-Compatible Edition instances in different Regions even if some tables don’t have a particular column on which the data can be split. The approaches use AWS Database Migration Service (AWS DMS) to successfully split the data as it gets migrated to AWS without compromising data integrity.

Part 1: Use AWS DMS to split the data into different Regions

We can use AWS DMS for the parent tables to split the data in such a way that data is migrated closer to the customer, for example, a West coast-based customer in the US has their data migrated to a us-west Region. The corresponding child tables and other tables are copied entirely into the particular Region. When the migration is complete and application is up and running, a cleanup exercise is performed during a low-activity window to clean up unwanted data from the child table and other tables. This approach is described in part 1.

Part 2: Use Oracle materialized views and AWS DMS to split the data into different Regions

With our second approach, we use AWS DMS to split the data from the parent tables as we did in the first approach. For the child tables, we create materialized views on the source database, which we use as source tables for AWS DMS, which are mapped to the child tables on the Aurora target database. This way, the corresponding child table’s data is migrated to the corresponding Region without compromising data integrity and without duplicating the data. This approach is described in this post.

As in any real-life scenario, each approach has advantages and disadvantages. We recommend working closely with your application team and cloud architect (for large enterprises, this can be cloud Center of Excellence team) to decide on either of the approaches.

Comparing approaches

The following table summarizes the advantages and disadvantages of each of the approach, which can help you make a quicker decision when choosing an approach that suits your data migration needs.

Part 1: Use AWS DMS Part 2: Use AWS DMS and materialized views
No additional setup needed on the Oracle source Materialized views need to be built and tested on the source database
No performance impact on the Oracle source because all the work of splitting and cleaning data happens on the Aurora target database There can be a potential performance impact on the existing Oracle source database because the materialized view gets refreshed every time data gets committed
Data is split and cleaned on the target Aurora database, so a larger amount of data needs to be processed and transferred in various Regions, which increases the AWS cost Data is pre-split on the Oracle source database, so no additional AWS cost is incurred for transferring and storing excess data
No additional storage is necessary to accommodate for Oracle materialized views Additional storage is necessary on the Oracle source to create and maintain materialized views
Additional cleanup steps need to be developed and tested on the Aurora target No additional cleanup is necessary on the Aurora target
Post-migration cleanup on the Aurora target database needs to be performed during a low-activity window No cleanup necessary
Higher cost for Aurora is incurred for additional storage of data until cleanup is completed No additional data is stored, so no cleanup is necessary and no additional costs are incurred

For more information about best practices, see Best practices for AWS Database Migration Service

Solution overview

This solution assumes you have an on-premises Oracle OLTP database, which we migrate over to Amazon Aurora for PostgreSQL.

The Oracle database for this post consists of a large number of tables with parent and child relationships. You can decide the customer location using certain logic and criteria on the parent tables, but you can’t apply similar logic on all the child tables because the necessary columns pertaining to a customer’s Regional tenancy aren’t present in the child tables. This creates a significant challenge to migrate all the data for a given customer to a particular Region.

For our use case, we consider a large enterprise, with its data centers located on the East coast of the United States. The enterprise embarks on the journey of migrating its applications and databases to AWS, taking advantages of various Regions. The application team decides to separate out the data in us-east and us-west Regions based on customer location. We use AWS DMS to split the data across the Regions based on the logic provided by the application team. As the data gets split, maintaining the data integrity is crucial for the smooth operation of the application and also for the customer.

The following diagram represents the data flow from AWS DMS to the Aurora database in different Regions.

Example table design

In this post, we use the following table design to migrate data.

The column TENANT_ID in the table PARENTTABLE has the customer-related information that the application can map to individual customers and their locations and thereby decide if the data corresponding to a particular TENANT_ID goes into a particular Region.

The table CHILDTABLE doesn’t have the TENANT_ID column, so data migration to a particular Region needs to rely on the foreign key relation to the PARENTTABLE.

For this post, we assume that the tenant IDs that are under 12,000 are migrating to an Aurora instance in the us-east Region, and those greater than 12,000 are migrating to the us-west Region. We create the tables with the following code:

The following code is the DDL for the table for an Oracle database.

CREATE TABLE DEMOSCHEMA.PARENTTABLE (
PARENTTABLE_ID NUMBER NOT NULL,
FIRST_NAME VARCHAR2(100) NOT NULL,
LAST_NAME VARCHAR2(100) NOT NULL,
TENANT_ID VARCHAR2(100) NOT NULL,
CONSTRAINT PARENTTABLE_PK PRIMARY KEY (PARENTTABLE_ID)
);
CREATE TABLE DEMOSCHEMA.CHILDTABLE (
CHILDTABLE_ID NUMBER NOT NULL,
PARENTTABLE_ID NUMBER NOT NULL,
ADDRESS VARCHAR2(100) NOT NULL,
CONSTRAINT CHILDTABLE_PK PRIMARY KEY (CHILDTABLE_ID),
CONSTRAINT CHILDTABLE_FK FOREIGN KEY (PARENTTABLE_ID) REFERENCES DEMOSCHEMA.PARENTTABLE(PARENTTABLE_ID) 
);

The following code is the DDL for the table for Amazon Aurora database.

CREATE TABLE DEMOSCHEMA.PARENTTABLE (
PARENTTABLE_ID DOUBLE PRECISION NOT NULL,
FIRST_NAME CHARACTER VARYING(100) NOT NULL,
LAST_NAME CHARACTER VARYING(100) NOT NULL,
TENANT_ID CHARACTER VARYING(100) NOT NULL,
CONSTRAINT PARENTTABLE_PK PRIMARY KEY (PARENTTABLE_ID)
);
CREATE TABLE DEMOSCHEMA.CHILDTABLE (
CHILDTABLE_ID DOUBLE PRECISION NOT NULL,
PARENTTABLE_ID DOUBLE PRECISION NOT NULL,
ADDRESS CHARACTER VARYING(100) NOT NULL,
CONSTRAINT CHILDTABLE_PK PRIMARY KEY (CHILDTABLE_ID),
CONSTRAINT CHILDTABLE_FK FOREIGN KEY (PARENTTABLE_ID) REFERENCES PARENTTABLE(PARENTTABLE_ID) 
);

Prerequisites

For this walkthrough, you should have the following prerequisites:

The services involved in this solution incur costs. You’re responsible for the costs related to your use of any AWS services while running this reference deployment. Prices are subject to change. For more information, refer to the pricing calculator.

Approach: Use Oracle materialized views and AWS DMS to split the data into different Regions

This approach uses Oracle materialized views to migrate data into Aurora. A materialized view is a database object that contains the results of a query. The materialized views are created on the condition of the tenancy for the data in the source Oracle database prior to migrating it onto the Aurora databases.

Step 1: Create materialized views in Oracle database

We create materialized views in the Oracle database with a condition on tenancy based on the Regions in which data needs to be split. In this example, because we’re splitting the data in two Regions, we create two materialized views for CHILDTABLE, which one in each Region. Data in PARENTTABLE is still split using a AWS DMS mapping table condition on the TENANT_ID column, as explained in the first approach. See the following code:

Create materialized view log on DEMOSCHEMA.PARENTTABLE with sequence, rowid including new values;

Create materialized view log on DEMOSCHEMA.CHILDTABLE with sequence, rowid including new values;

Create materialized view DEMOSCHEMA.CHILDTABLE_USEAST_MVIEW refresh fast on commit as 
Select a.rowid m_rowid, b.rowid c_rowid, b.childtable_id, b.parenttable_id, b.address from DEMOSCHEMA.PARENTTABLE a, DEMOSCHEMA.CHILDTABLE b where a.PARENTTABLE_ID=b.PARENTTABLE_ID and a.TENANT_ID < 12000; 

Create materialized view DEMOSCHEMA.CHILDTABLE_USWEST_MVIEW refresh fast on commit as 
Select a.rowid m_rowid, b.rowid c_rowid, b.childtable_id, b.parenttable_id, b.address from DEMOSCHEMA.PARENTTABLE a, DEMOSCHEMA.CHILDTABLE b where a.PARENTTABLE_ID=b.PARENTTABLE_ID and a.TENANT_ID > 12001;

Step 2: Verify data in the materialized views

Verify the data in the materialized views reflects the data that needs to be split based on Regions.

For this post, we run the following SQL code to verify the data in us-east:

select * from "DEMOSCHEMA"."CHILDTABLE_USEAST_MVIEW";

The following screenshot shows our results.

We run the following code to verify the data in us-west:

Step 3: Create an AWS DMS task to migrate data into Aurora from Oracle

We create an AWS DMS task with PARENTTABLE, which is split based on the tenancy condition. Materialized views in Oracle are mapped to CHILDTABLE in Aurora, thereby removing the need to split the data again on the target or clean up data after the migration.

  1. Create an AWS DMS task with migration type Migrate existing changes and replicate ongoing changes for full load and CDC to copy the existing data from source and continue with the replication of data for any changes that occur.
  2. For the target table preparation mode, choose Do nothing because the DDL of the materialized view is different from that of the target CHILDTABLE.
  3. Create the table mapping with the source schema and the table name as the materialized view you created earlier.
  4. Add a transformation rule in the task with the table name of the materialized view renamed to CHILDTABLE.
    This ensures that the Oracle materialized view data is transferred to CHILDTABLE in the Aurora database.
  5. Add another table mapping rule for PARENTTABLE with a condition on the tenant_id column for values greater than or equal to 12,000.

After you create the task, you can start it. When the initial load is complete, the rows have been loaded on to the target database.

Step 3: Verify data in the Aurora database and insert new values to demonstrate the CDC transfer

After the table is completely loaded by the AWS DMS task, you can verify the data in the source and target databases and insert new data on the source database.

In this example, we insert a new row into the Oracle PARENTTABLE and CHILDTABLE. On commit, because the fast refresh materialized view is created, the data is reflected in the materialized view immediately.

We enter the following code:

Select count(*) from DEMOSCHEMA.CHILDTABLE;
Select count(*) from DEMOSCHEMA.PARENTTABLE;
Select count(*) from DEMOSCHEMA.CHILDTABLE_USEAST_MVIEW;
Select count(*) from DEMOSCHEMA.CHILDTABLE_USWEST_MVIEW;
Insert into DEMOSCHEMA.PARENTTABLE values (202,'First','Last','14000'); 
Insert into DEMOSCHEMA.CHILDTABLE values (2000,202,'First Last address');
Commit;
Select count(*) from DEMOSCHEMA.CHILDTABLE_USWEST_MVIEW;

The following screenshot shows our results.

This data change is automatically replicated into the target Aurora database. The following screenshot shows the data in CHILDTABLE in the Aurora database, updated by AWS DMS.

Conclusion

When you’re migrating on-premises data to AWS and need to place this data close to your customers for low latency and better application performance, you may need to split data across various Regions. This post showed you the approach using Oracle Materialized Views and AWS DMS to successfully split the data without compromising the data integrity. Using Oracle Materialized Views, the data is split in Oracle database prior to migration to AWS and then migrated using AWS DMS.

Part 1 describes the method of using AWS DMS alone to split data across Regions without needing to create Oracle Materialized Views.

As always, AWS welcomes feedback. If you have any comments or questions on this post, please share them in the comments.


About the authors

Saumya Mula is a Database Consultant with the Professional Services team at Amazon Web Services. She works as a database migration consultant to provide Amazon customers with technical guidance to migrate their on-premises databases to AWS.

 

 

 

Mandar Paranjpe is Practice Manger, Worldwide Public Sector with the Professional Services team at Amazon Web Services. He helps customers on their digital and cloud transformation journeys, with a focus on big data and database migrations.