How can I use the DMS batch apply feature to improve CDC replication performance?

Last updated: 2021-02-16

I'm running a full load and a change data capture (CDC) AWS Database Migration Service (AWS DMS) task. The source latency isn't high, but the target latency is high or it's increasing. How can I speed up the CDC replication phase?

Short description

AWS DMS uses the following methods to replicate data in the change data capture (CDC) phase:

  • Transactional apply
  • Batch apply

The AWS DMS CDC process is single threaded, by default (transactional apply). This is the same method used for SQL replication as by all other online transactional processing (OLTP) database engines. DMS CDC replication is dependent on the source database transaction logs. During the ongoing replication phase, DMS applies changes using a transactional apply method, as follows:

  1. DMS reads changes from the transaction log, from the source into the replication DB instance memory.
  2. DMS translates changes and then passes them on to a sorter component.
  3. The sorter component sorts transactions in commit order, and then forwards them to the target, sequentially.

If the rate of change is high on the source DB, this process can take time. You might observe a spike in CDC target latency metrics when DMS receives high incoming workload from source DB.

DMS uses a single threaded replication method to process the CDC changes. DMS provides the task level setting BatchApplyEnabled to quickly process changes on a target using batches. BatchApplyEnabled is useful if you have high workload on the source DB, and a task with high target CDC latency. By default, DMS disables BatchApplySetting. You can enable this using AWS Command Line Interface (AWS CLI).

How batch apply works

If you run a task with BatchApplyEnabled, DMS processes changes in the following way:

  1. DMS collects the changes in batch from the source DB transaction logs.
  2. DMS creates a table called the net changes table, with all changes from the batch.
  3. This table resides in the memory of the replication DB instance, and is passed on to the target DB instance.
  4. DMS applies a net changes algorithm that nets out all changes from the net changes table to actual target table.

For example, if you run a DMS task with BatchApplyEnabled, and you have a new row insert, ten updates to that row, and a delete for that row in a single batch, then DMS nets out all these transactions and doesn’t carry them over. It does this because the row is eventually deleted and no longer exists. This process reduces the number of actual transactions that are applied on the target.

BatchApplyEnabled applies the net changes algorithm in row level of a table within a batch of a particular task. So, if the source database has frequent changes (update, delete, and insert) or a combination of those workloads on the same rows, you can then get optimal use from the BatchApplyEnabled. This minimizes the changes to be applied to the target. If the collected batch is unique in changes (update/delete/insert changes for different row records), then the net change table algorithm process can't filter any events. As a result, all batch events are applied on the target in batch mode. Tables must have either a primary key or a unique key for batch apply to work.

DMS also provides the BatchApplyPreserveTransaction setting for change-processing tuning. If you enable BatchApplySetting, then BatchApplyPreserveTransaction turns on, by default. If you set it to true, then transactional integrity is preserved. A batch is guaranteed to contain all the changes within a transaction from the source. This setting applies only to Oracle target endpoints.

Note: Pay attention to the advantages and disadvantages of this setting. When the BatchApplyPreserveTransaction setting is true, DMS captures the entire long-running transaction in the memory of the replication DB instance. It does this according to the task settings MemoryLimitTotal and MemoryKeepTime, and swaps as needed, before it sends changes to the net changes table. When the BatchApplyPreserveTransaction setting is false, changes from a single transaction can span across multiple batches. This can lead to data loss when partially applied, for example, due to target database unavailability.

For more information about DMS latency and the batch apply process, see Part 2 and Part 3 of the Debugging your AWS DMS migrations blogs.

Use cases for batch apply

You can use batch apply in the following circumstances:

  • The task has a high number of transactions captured from the source and this is causing target latency.
  • The task has a workload from source that is a combination of insert, update, and delete on the same rows.
  • No requirement to keep strict referential integrity on the target (disabled FKs).

Limitations

Batch apply currently has the following limitations:

  • The Amazon Redshift target uses batch apply, by default. The Amazon Simple Storage Service (Amazon S3) target is forced to use transactional apply.
  • Batch apply can only work on tables with primary key/unique index. For tables with no primary key/unique index, bulk apply will only apply the insert in bulk mode, but performs updates and deletes one-by-one. If the table has primary key/unique index but one-by-one mode switched is observed, see How can I troubleshoot why Amazon Redshift switched to one-by-one mode because a bulk operation failed during an AWS DMS task?
  • When LOB columns are included in the replication, you can use BatchApplyEnabled in limited LOB mode, only. For more information, see Target metadata task settings.
  • When BatchApplyEnabled is set to true, AWS DMS generates an error message if a target table has a unique constraint.

Resolution

Note: If you receive errors when running AWS Command Line Interface (AWS CLI) commands, make sure that you’re using the most recent AWS CLI version.

BatchApplySetting is disabled by default. You can enable this setting using either the AWS CLI or the AWS DMS Console. Complete the following setup tasks on your system before enabling batch setting:

Check the batch setting status of an existing task

  1. Open the AWS DMS Console.
  2. From the Navigation panel, choose Database migration tasks
  3. Choose your task, and then choose Task Setting (JSON). In the JSON, the BatchApplyEnabled is listed in the disabled status.

Enable batch setting using the AWS CLI

  1. Open the system with AWS CLI installed.
  2. Run the aws configure command to open the AWS CLI prompt.
  3. Enter your AWS access key ID and then press Enter.
  4. Enter your AWS secret key ID and then press Enter.
  5. Enter the Region name of your DMS resources and then press Enter.
  6. Enter the output format and then press Enter.
  7. Run the modify-replication-task command with task ARN and batch setting conditions.

Note: Confirm that the task is in the stopped state before you modify the task. Change the ARN on the following command based on your task, and then runs it to change the task setting.

After the command has run successfully in the AWS CLI, open the DMS console and check the batch setting status of your task again. The BatchApplyEnabled is now listed as enabled in the Task Setting (JSON).

You can now start the DMS task and observe the migration performance.

aws dms modify-replication-task --replication-task-arn arn:aws:dms:us-east-1:123456789123:task:4VUCZ6ROH4ZYRIA25M3SE6NXCM --replication-task-settings "{\"TargetMetadata\":{\"BatchApplyEnabled\":true}}"

Enable batch setting using the AWS DMS Console

  1. Open the AWS DMS Console.
  2. From the navigation panel, choose Database migration task.
  3. Choose your task, and then choose Modify.
  4. From the Task settings section, choose JSON editor.
  5. Modify the task settings you want to change. For example, from the TargetMetadata section, change BatchApplyEnabled to true (default is false).
  6. Click save to modify the task.

Verify that the changes have taken effect by following these steps:

  1. From the Task list page, choose the task you modified.
  2. From the Overview details tab, expand Task settings (JSON).
  3. Review the task settings for the task.

Troubleshoot CDCLatencyTarget high after running task in batch mode

If the CDCLatencyTarget is high after running the task in batch mode, the latency could be caused by the following:

  • Long running transaction on target due to lack of primary and secondary index
  • Insufficient resource availability to process the workload on target
  • High resource contention on DMS replication instance

Follow the DMS best practices to troubleshoot these issues.