How can I troubleshoot why Amazon Redshift switched to one-by-one mode because a bulk operation failed during an AWS DMS task?

Last updated: 2019-10-02

I have an AWS Database Migration Service (AWS DMS) task that is migrating data to Amazon Redshift as the target. But my task changed to one-by-one mode because a bulk operation failed. How do I troubleshoot this issue?

Short Description

During the change data capture (CDC) phase of a task, AWS DMS uses a single thread to read change data from the source and apply changes to the target. Because the thread can handle only a certain number of transactions at a time, depending on the rate of changes on the source, sometimes the thread can't keep the target in sync with the source. This happens more often when Amazon Redshift is the target for AWS DMS because commits are expensive in OLAP engines. By default, AWS DMS uses Batch Apply mode to process the changes in batches. When using Batch Apply mode, AWS DMS does the following:

  1. Collects changes from a batch that is controlled by Batch Apply settings.
  2. Creates a net changes table that contains all the changes from the batch to the target instance. 
  3. Uses an algorithm that groups transactions and applies them in bulk to the target.

When a migration task that is replicating data to Amazon Redshift has an issue applying a batch, AWS DMS doesn't fail the whole batch. AWS DMS breaks the batch down and switches to one-by-one mode to apply transactions. When AWS DMS encounters the transaction that caused the batch to fail, AWS DMS logs the transaction to the awsdms_apply_exceptions table on the Amazon Redshift target. Then, AWS DMS applies the other transactions in the batch one by one until all transactions from that batch are applied onto the target. Finally, AWS DMS switches back to Batch Apply mode for a new batch and continues to use Batch Apply unless another batch fails.

Resolution

You can see whether your batch failed and AWS DMS used one-by-one mode by checking the AWS DMS task log. Each time a batch fails and AWS DMS switches to one-by-one mode, you see the following log entry:

Messages
[TARGET_APPLY ]I: Bulk apply operation failed. Trying to execute bulk statements in 'one-by-one' mode (bulk_apply.c:2175)

When this happens, AWS DMS applies transactions sequentially onto the target until AWS DMS encounters an issue with any transaction in the batch. If AWS DMS encounters an issue, it logs the transaction and you see a log entry similar to the following:

Messages
[TARGET_APPLY ]W: Source changes that would have had no impact were not applied to the target database. Refer to the 'awsdms_apply_exceptions' table for details. (endpointshell.c:5984)

Note: By default, the awsdms_apply_exceptions table is created in the public schema, unless you specify a control table schema in the task settings of your AWS DMS task.

After AWS DMS logs the transaction, it completes the application of all the transactions from that batch. Then, AWS DMS switches to Batch Apply again, and you see a message in the log that is similar to the following:

Messages
[TARGET_APPLY ]I: Switch back to bulk apply mode (bulk_apply.c:4751)

Amazon Redshift is an OLAP data warehouse that is optimized to run complex analytic queries. However, Amazon Redshift performance can be affected when running transactional changes from an OLTP database. As a result, when Batch Apply fails and AWS DMS switches to one-by-one mode, you can see that target latency increases for the duration of the time that AWS DMS runs transactions in a one-by-one mode. After AWS DMS switches back to Bulk Apply, the target latency reduces.

To resolve this issue, connect to the Amazon Redshift target. Then, get the output from the awsdms_apply_exceptions table to identify the query that caused the batch to fail:

select * from public.awsdms_apply_exceptions order by 4 desc;

After you find the query that caused the batch to fail (for example, update conflicts or constraint violations), then you can resolve the conflicts to prevent tasks from moving to one-by-one mode.