How can I use the DMS batch apply feature to improve CDC replication performance?

Last updated: 2020-08-24

I'm running a full load and a change data capture (CDC) AWS Database Migration Service (AWS DMS) task. The source latency is not high, but the target latency is high or it is increasing. How can I speed up the CDC replication phase?

Short description

AWS DMS uses the following methods to replicate data in the change data capture (CDC) phase:

  • Transactional apply
  • Batch apply

The AWS DMS CDC process is single threaded, by default (transactional apply). It uses the same method for SQL replication as all other online transactional processing (OLTP) database engines. DMS CDC replication is completely dependent on the source database transaction logs. During the ongoing replication phase, DMS applies changes using a transactional apply method, as follows:

  1. DMS reads changes from the transaction log, from the source into the replication DB instance memory.
  2. DMS translates changes and then passes them on to a sorter component.
  3. The sorter component sorts transactions in commit order, and then forwards them to the target, sequentially.

If the rate of change is high on the source DB, this process can take time. You can observe a spike in CDC target latency metrics when DMS receives high incoming workload from source DB.

DMS uses a single threaded replication method to process the CDC changes. DMS provides the task level setting BatchApplyEnabled to quickly process changes on a target using batches. BatchApplyEnabled is useful if you have high workload on the source DB, and a task with high target CDC latency. By default, DMS disables BatchApplySetting. You can enable this using AWS Command Line Interface (AWS CLI).

How batch apply works

If you run a task with BatchApplyEnabled, DMS processes changes in the following way:

  1. DMS collects the changes in batch from the source DB transaction logs.
  2. DMS creates a table called the net changes table, with all changes from the batch.
  3. This table resides in the memory of the replication DB instance, and is passed on to the target DB instance.
  4. DMS applies a net changes algorithm that nets out all changes from the net changes table to actual target table.

For example, if you run a DMS task with BatchApplyEnabled, and you have a new row insert, 10 updates to that row, and a delete for that row in a single batch, then DMS nets out all these transactions and doesn’t carry them over. It does this because the row is eventually deleted and no longer exists. This process reduces the number of actual transactions that are applied on the target.

BatchApplyEnabled applies the net changes algorithm in row level of a table within a batch of a particular task. So, if the source database has frequent changes (update, delete, and insert) or a combination of those workloads on the same rows, you can then get optimal use from the BatchApplyEnabled. This minimizes the changes to be applied to the target. If the collected batch is unique in changes (i.e., update/delete/insert changes for different row records), then the net change table algorithm process can't filter any events, and all batch events are applied on the target in batch mode. Tables need to have either a primary key or unique key for batch apply to work.

DMS also provides the BatchApplyPreserveTransaction setting for change-processing tuning. If you enable BatchApplySetting, then BatchApplyPreserveTransaction turns on, by default. If you set it to true, transactional integrity is preserved and a batch is guaranteed to contain all the changes within a transaction from the source. This setting applies only to Oracle target endpoints.

Note: Pay attention to the advantages and disadvantages of this setting. When the BatchApplyPreserveTransaction setting is true, DMS captures the entire long-running transaction in the memory of the replication DB instance. It does this as per task setting MemoryLimitTotal and MemoryKeepTime, and swaps as needed, before it sends changes to the net changes table. When the BatchApplyPreserveTransaction setting is false, changes from a single transaction can span across multiple batches. This can lead to data loss when partially applied, for example, due to target database unavailability.

For more information about DMS latency and the batch apply process, see Part 2 and Part 3 of the Debugging your AWS DMS migrations blogs.

Use cases for batch apply

You can use batch apply in the following circumstances:

  • The task has a high number of transactions captured from the source and this is causing target latency.
  • The task has a workload from source that is a combination of insert, update, and delete on the same rows.
  • No requirement to keep strict referential integrity on the target (disabled FKs).

Limitations

Batch apply currently has the following limitations:

Resolution

BatchApplySetting is disabled by default. You can only enable this setting using AWS Command Line Interface (AWS CLI). Complete the following setup tasks on your system before enabling batch setting:

Check the batch setting status of an existing task

  1. Open the AWS DMS Console.
  2. From the Navigation panel, choose Database migration tasks
  3. Choose your task, and then choose Task Setting (JSON). In the JSON, the BatchApplyEnabled is listed in the disabled status.

Enable batch setting

  1. Open the AWS CLI installed system.
  2. Run the aws configure command to enter into CLI prompt.
  3. Type your AWS access key ID and then press enter.
  4. Type your AWS secret key ID and then press enter.
  5. Type the Region name of your DMS resources and then press enter.
  6. Type the output format and then press enter.
  7. Execute the modify-replication-task command with task ARN and batch setting conditions.

Note: Confirm that the task is in the stopped state before you modify the task. Change the ARN on the following command based on your task, and then execute it to change the task setting.

After the command has executed successfully in CLI, open the DMS console and check the batch setting status of your task, again. The BatchApplyEnabled is now listed as enabled in the Task Setting (JSON).

You can now start the DMS task and observe the migration performance.

aws dms modify-replication-task --replication-task-arn arn:aws:dms:us-east-1:123456789123:task:4VUCZ6ROH4ZYRIA25M3SE6NXCM --replication-task-settings "{\"TargetMetadata\":{\"BatchApplyEnabled\":true}}"

Troubleshoot CDCLatencyTarget high after running task in batch mode

If the CDCLatencyTarget is high after running the task in batch mode, the latency could be caused by the following:

  • Long running transaction on target due to lack of primary and secondary index
  • Insufficient resource availability to process the workload on target
  • High resource contention on DMS replication instance

Follow the DMS best practices to troubleshoot these issues.