Why did my AWS DMS task fail with no errors?

4 minute read

To migrate my data from a source engine to a target engine, I use AWS Database Migration Service (AWS DMS). But the task fails without any errors.

Short description

When an AWS DMS task fails, an entry is made in the task log. The task log provides information about the failure cause with either error messages (]E:) or warning messages (]W:). In some cases, an AWS DMS task can fail without any errors or warnings, which makes it difficult to troubleshoot.

Most often, the AWS DMS task fails for one of these reasons:

Resource contention on the replication instance

CPU and memory are the two most important resources that are required for a migration task:

CPU must first convert the source data type to the AWS DMS type data type, and then convert it to the target data type.
Memory is required because AWS DMS creates streams to the source and target. AWS DMS stores information in the stream buffers in memory on the replication instance.

The internal monitoring system also uses CPU and memory to monitor the replication instance. Any contention on either CPU or memory can cause a migration task to silently fail.

Storage Full status on the replication instance

If the replication instance storage is full, then a migration task can silently fail with no errors.

An internal error occurred

AWS DMS tasks can also silently fail if there are internal errors. Internal errors aren't visible in task logs that are logged by default.

Resolution

Note: If your task uses a non-relational database management system, then you might want to run the task without parallel settings. For more information, see Target metadata task settings.

Review your DMS, source, and target logs for more information. Check the time of the last entry in the task logs after the task silently failed. Then, review the CPU, memory, and disk utilization on the replication instance for the same time that the failure was logged.

If you see a combination of the low FreeableMemory and high SwapUsage, then there might be memory contention on the replication instance. For more information, see AWS Data Migration Service metrics.

To view the CloudWatch metrics, complete the following steps:

Open the AWS DMS console.
In the navigation pane, choose Database migration tasks.
Choose the name of task that failed.
From the Overview details section, note the name of the replication instance.
From the navigation pane, choose Replication instances.
Choose the name of the replication instance that you noted.
In the Migration task metrics section, review the CPUUtilization, SwapUsage, FreeableMemory, and FreeStorageSpace metrics.
To view more details, hover over the metric, and choose the more options icon.
Choose View in metrics. This opens the CloudWatch console.

In the CloudWatch console, view the metric's utilization at the time that the task failed.

If you see constant CPU or memory contention, then reduce the number of tasks that are running on the replication instance. To reduce the number of tasks, you can launch new replication instances and distribute the tasks across multiple replication instances. Or, scale up the replication instance to a larger instance type.

Note: T2 instances provide a baseline performance after the CPU credits are exhausted. For example, a T2.micro instance provides a baseline performance of 10%. Take the instance type into account when you verify the CPU utilization. For more information, see Key concepts and definitions for burstable performance instances.

After you identify the source of the silent failure, restart the task. If there isn't contention on CPU, memory, or disk space, then the task likely failed because of an internal error. To troubleshoot internal errors, turn on detailed debugging. Review the logs that occurred before the error, and then turn on detailed debugging for the related logs. For example, if the last logs are from TARGET_APPLY, then turn on detailed debugging for SORTER, TARGET_APPLY. After you turn on detailed debugging, restart the task, and then review the task logs to identify why the task failed.

Note: The issue might be because of problems with validation, and not with your data. To test if the validation component is the cause of your issue, run a validation only task to see if the issue occurs.

Related information

Troubleshooting migration tasks in AWS Database Migration Service

How do I get technical support from AWS?

Why is my AWS DMS replication DB instance in the storage-full status?

Topics

Migration & Modernization

Relevant content

DMS task fails--Migrating from Postgresql to Aurora Postgresql the replication task fails with the below error
rePost-User-8612994
asked 2 years ago
DMS Task Fails with RECOVERABLE_ERROR but no other info
nathancarmona
asked 5 years ago
About dms task fail
rePost-User-7543468
asked 2 years ago
AWS DMS 3.5.1 Tasks Suddenly start failing without errors
IT-BABA
asked 8 months ago
AWS DMS Replication Task Stopped Logging
Accepted Answer
Ross Bush
asked a year ago
Why can't I see CloudWatch logs for an AWS DMS task?
AWS OFFICIALUpdated 7 months ago
Why did my AWS DMS task validation fail, or why isn't the validation progressing?
AWS OFFICIALUpdated 7 months ago
How do I modify the error handling task settings for an AWS DMS task?
AWS OFFICIALUpdated 6 months ago
Why is my AWS DMS task in an error status?
AWS OFFICIALUpdated 2 years ago
EMR Cluster failure with "Failed to start the job flow due to an internal error"
SUPPORT ENGINEER
Yokesh NK
published 7 days ago