How to handle AWS DMS replication when used with Oracle database in fail-over scenarios

AWS Database Migration Service (AWS DMS) is used for migrating databases to AWS, but you can also use it for replicating data with low latency from any supported source to any supported target.

When working with Oracle database, AWS DMS stops replicating data changes due to reset logs when the standby database is transitioned to the primary database during a fail-over of the primary database.

In this post, we describe the sequence of the necessary steps to overcome this issue and also demonstrate how to handle AWS DMS source endpoints during database failover. We also demonstrate how to add an extra connection attribute to support database fail-over scenarios and the steps you need to take before you re-point AWS DMS to the standby database.

Solution Overview

The following diagram illustrates our fail-over architecture.

For AWS DMS to keep replicating data changes from an Oracle standby database to Oracle Primary database, after a database fail-over, we add the supportResetlog=TRUE parameter as an extra connection attribute in the AWS DMS source endpoint settings. By adding this parameter, AWS DMS is able to read the new set of archive log sequence numbers generated post database fail-over.

Should a fail-over happen while we’re using AWS DMS with a standby database without setting supportResetlog=TRUE, we would see the following error on the AWS DMS task logs

Errorlog

AWS DMS when used with Oracle Data Guard setup, can take advantage of Oracle Standby database to offload the workload from the primary database. However, should a fail-over occur, the DMS replication tasks connected to the Oracle Standby database would stop replicating changes as a result of an event of reset logs, which is not supported with standby databases. In an Oracle Data Guard environment, the standby database is in sync with the primary database by receiving and applying redo data from the primary database. If the primary database becomes unavailable, you can use Oracle Data Guard to switch or fail over the standby database to the primary role.

To overcome this issue, we need to add an extra connection attribute supportResetlog=TRUE to AWS DMS source endpoint settings repoint AWS DMS source to primary database and resume the DMS task either by specifying custom CDC start point or a log sequence number or by specifying a recovery checkpoint. For more information, refer to Creating tasks for ongoing replication using AWS DMS.

Note: You need to set supportResetlog=TRUE if your AWS DMS version is 3.4.7 and below, since this parameter by default is set to TRUE in future releases.

In the following sections, we walk through the process of adding an extra connection attribute (supportResetlog=TRUE) to the AWS DMS source endpoint. We then reconnect AWS DMS to the standby database after the change data capture (CDC) of AWS DMS tasks bypasses the sequence numbers of archive logs that have been generated post database fail-over.

Prerequisites

For this walk-through, the following prerequisites are necessary:

An AWS account.
Oracle databases in a primary and standby setup and in real-time sync. For instructions, refer to Creating a Physical Standby Database.
Replication instance in AWS DMS should be able to access primary and standby hosts. For instructions, refer to Setting up a network for a replication instance
Source and destination target endpoints. For instructions, refer to Creating source and target endpoints.
An AWS DMS migration task. For instructions, refer to Creating a task.

Create source and target endpoints

Based on the Oracle Data Guard setup, we configure the source endpoint to connect to the standby database and the target endpoint to any supported target.

For this post, we use an Oracle standby database as the source and Amazon Simple Storage Service (Amazon S3) as the target.

Configure the source endpoint to connect to the new primary database host and add the extra connection attribute

After a fail-over of the database, we want AWS DMS source endpoint to connect to the new primary database host and add the supportResetlog=TRUE parameter to the extra connection attribute of the source endpoint. Complete the following steps:

On the AWS DMS console, choose Endpoints in the navigation pane.
Choose the source endpoint you want to modify.
For Server name, enter the name of the primary database host.
Select Use endpoint connection attributes.
Add the supportResetlog=TRUE parameter in the Extra Connection Attributes.

During fail-over or switchover scenarios, there is a possibility that new primary might be using different archive DEST_ID values compared to original primary. In this case, AWS DMS needs information about what destination to get archive redo logs from to read changes.

Oracle uses random DEST_ID which ranges from 0 to 6 . Therefore it’s a best practice to add the ExtraArchivedLogDestIds=[1,2,3,4,5,6] to cover DEST_ID from 1 to 6 and useZeroDestid=true parameters as well to cover DEST_ID value as 0.

The following screenshot has been taken after Use Endpoint Connection Attributes has been Enabled and values has been added to Extra Connection attributes.

For more information about extra connection attributes, refer to Extra connection attributes when using Oracle as a source for AWS DMS.

Resume the AWS DMS task and verify the logs

When you resume a task, AWS DMS continues replication from the last point before the task stopped or failed. When you restart a task, AWS DMS begins replication from the start, and uses the table preparation mode that you chose when you created the task.

For this scenario, we need the task to continue from where it left off and therefore we will resume the AWS DMS task and verify the logs in Amazon CloudWatch. Complete the following steps:

On the AWS DMS console, choose Database migration tasks in the navigation pane.
Select the task you want to resume and on the Actions menu, choose Restart/Resume.
Select Resume and choose Start task.
Choose the name of your task, then review the Overview details section.
Choose View CloudWatch logs.

In the following screenshot, you can see the archived redo log sequence processing from sequence number 5, which indicates that the AWS DMS process started reading the new sequence archive logs of the database post fail-over.

For more information about viewing your AWS DMS task details, refer to How do I use Overview details to access my AWS DMS task information.

Stop the Replication Task and Configure the AWS DMS source endpoint back to the standby database

At this point, it’s safe to reconfigure the AWS DMS source endpoint back to the standby database to avoid any performance impact to the primary database.

On the AWS DMS console, choose Database migration tasks in the navigation pane.
Select the task you want to stop and on the Actions menu, choose Stop.
On the AWS DMS console, choose Endpoints in the navigation pane.
Choose the source endpoint you want to modify.
For Server name, enter the name of the standby database host.

Resume the AWS DMS task

Now you can resume the AWS DMS task.

On the AWS DMS console, choose Database migration tasks in the navigation pane.
Select the task you want to resume and on the Actions menu, choose Restart/Resume.
Select Resume and choose Start task.

The following screenshot was taken after resuming the task on the standby Database. We can also see in the last line of screenshot that CAPTURE process is reading from ONLINELOG which means there is NO lag in our replication and redo changes are current.

Conclusion

Replication using AWS DMS is a common solution for many customers, and an Oracle standby database is a common method to capture data changes by AWS DMS.

In this post, we demonstrated how to handle AWS DMS replication task in a fail-over scenario, with the help of extra connection attribute supportResetlog=TRUE .We also discussed the steps needed to point AWS DMS back to the standby database, which can help offload the primary database in a sensitive production environment.

We invite you to leave feedback in the comments sections in this post.

About the Author

Prashanth Ramaswamy is a Senior Database Consultant with the Professional Services team at Amazon Web Services. Prashanth focuses on leading the database migration efforts to AWS as well as provide technical guidance including cost optimization,monitoring and modernization expertise to Amazon customers.

Jeemy Patel is a Database Consultant with the Professional Services team at Amazon Web Services. Jeemy helps customer with migration to AWS, performance optimization as well as provide technical guidance on various Disaster Recovery solutions for Amazon customers.

Saumya Mula is a Senior Database Consultant with the Professional Services team at Amazon Web Services. She provides overall guidance on database migrations from on-premises to AWS along with automation , cost management and performance tuning of the critical production systems for Amazon customers.