Why did my AWS DMS task fail with no errors?
Last updated: 2021-02-16
I am using AWS Database Migration Service (AWS DMS) to migrate my data from a source engine to a target engine. But the task is failing without any errors. How do I troubleshoot this issue?
When an AWS DMS task fails, the task logs provide information about the cause of the failure with either error messages (]E:) or warning messages, (]W:). In some cases, an AWS DMS task can fail without any errors or warnings, which can make it difficult to troubleshoot the cause. Most often, this is caused by one of the three following reasons:
1. Resource contention on the replication instance
CPU and memory are the two most important resources that are required for a migration task:
- CPU is required to convert the source datatype to the AWS DMS type data type, and then finally to the target data type.
- Memory is required because AWS DMS creates streams to the source and target. AWS DMS stores information in the stream buffers in memory on the replication instance.
CPU and memory are also used by the internal monitoring system to monitor the replication instance. Any contention on either CPU or memory can cause a migration task to fail silently.
2. Storage Full status on the replication instance
If the replication instance storage is full, a migration task can fail silently with no errors. For more information, see Why is my AWS DMS replication instance in a storage-full status?
3. An internal error occurred
AWS DMS tasks can also fail silently if there are internal errors, which aren't visible in task logs that are logged by default.
First, check the time of the last entry in the task logs after the task failed silently. Then, verify the CPU, memory, and disk utilization on the replication instance around the same time that the failure was logged.
If you see a combination of the low FreeableMemory and high SwapUsage, then there might be memory contention on the replication instance. Be sure to check both metrics. For more information, see Data Migration Service metrics.
To view the CloudWatch metrics, follow these steps:
- Open the AWS DMS console, and choose Database migration tasks from the navigation pane.
- Choose the name of task that failed.
- Note the name of the Replication instance from the Overview details section.
- Choose Replication instances from the navigation pane.
- Choose the name of the replication instance noted in the step 3.
- In the Migration task metrics section, you can view the CPUUtilization, SwapUsage, FreeableMemory, and FreeStorageSpace metrics.
- To view more details, hover over the metric, and choose the more options icon (three vertical dots).
- Choose View in metrics.
This opens the CloudWatch console where you can view the metric's utilization at the time that the task failed.
If you see constant CPU or memory contention, consider reducing the number of tasks that are running on the replication instance. You can do this by launching new replication instances and distributing the tasks across multiple replication instances. Or, consider scaling up the replication instance to a larger instance type.
Note: T2 instances provide a baseline performance after the CPU credits are exhausted. For example, a T2.micro instance provides a baseline performance of 10%. For this reason, take into account the instance type that is used and verify the CPU utilization accordingly. For more information about CPU credits and baseline performance, see CPU Credits and Baseline Performance for Burstable Performance Instances.
After you identify the source of the silent failure, restart the task.
If there isn't contention on CPU, memory, or disk space, then the task most likely failed because of an internal error. To troubleshoot internal errors, enable detailed debugging on all the five log components. After detailed debugging is enabled, restart the task and review the task logs to identify why the task failed.