Why is my AWS DMS task not retrying?

3 minute read

I have an AWS Database Migration Service (AWS DMS) task that has stopped and is not retrying. How can I resume operation of my AWS DMS task?

Resolution

AWS DMS is a managed service that is designed to have self-healing behavior. This means that when issues occur, AWS DMS attempts to fix the issue and then resume operation without you needing to take any action. However, there are some situations when migration stops and doesn't retry.

First, it's important to understand the two types of errors that you can encounter when using AWS DMS:

Fatal errors
Recoverable errors

Fatal errors

If AWS DMS encounters an error that stops it from proceeding with migration, then the task is stopped and it enters a FAILED state. This is called a fatal error. Some examples include:

The source endpoint isn't configured, which is prerequisite for migration.
The AWS DMS replication instance doesn't fetch source objects from the source database.

In the task logs, you see messages similar to this:

"2022-05-28T16:07:35 [TASK_MANAGER ]E: Task 'K7YJOFK7GYXIK44C2KLGFNG7ZONLZGPWPD5RWHA' encountered a fatal error"

When AWS DMS encounters a fatal error, it tries to restart six times. If your task is no longer retrying, then it has likely already completed these attempts.

Recoverable errors

AWS DMS considers all environmental errors as recoverable errors. So, if a task or replication instance encounters an environmental error, then the task is interrupted but recovers itself, and then retries.

Examples of recoverable errors include:

AWS DMS replication instance connectivity to the source/target database is interrupted.
Because of maintenance, the replication instance restarted.

In the task logs, you see messages similar to this:

"Last Error Task error notification received from subtask 0, thread 0 [reptask/replicationtask.c:2673] [1022502] Stop Reason RECOVERABLE_ERROR Error Level RECOVERABLE"

By default, a task with a recoverable error attempts to retry, indefinitely. The RecoverableErrorCount setting controls this behavior. This parameter sets the maximum number of attempts that AWS DMS makes to restart a task when it encounters an environmental error. After the system tries to restart the task a designated number of times, then the task stops and manual intervention is needed. The default value is -1, which tells AWS DMS to restart the task indefinitely.

If a recoverable error causes a task to stop and it no longer retries, then check whether:

The RecoverableErrorCount parameter is set to a custom value.
The replication instance itself is down.

Check if other non-default value settings are preventing retries

If these settings are set to a non-default value, they might prevent the AWS DMS task from retrying:

"ErrorBehavior": {
        "FailOnNoTablesCaptured": false,
        "ApplyErrorUpdatePolicy": "LOG_ERROR",  --- can be set to STOP_TASK
        "FailOnTransactionConsistencyBreached": false,
        "RecoverableErrorThrottlingMax": 1800,
        "DataErrorEscalationPolicy": "SUSPEND_TABLE",  --- can be set to STOP_TASK
        "ApplyErrorEscalationCount": 0,
        "RecoverableErrorStopRetryAfterThrottlingMax": false,
        "RecoverableErrorThrottling": true,
        "ApplyErrorFailOnTruncationDdl": false,
        "DataTruncationErrorPolicy": "LOG_ERROR",  --- can be set to STOP_TASK
        "ApplyErrorInsertPolicy": "LOG_ERROR",  --- can be set to STOP_TASK
        "EventErrorPolicy": "IGNORE",
        "ApplyErrorEscalationPolicy": "LOG_ERROR",  --- can be set to STOP_TASK
        "RecoverableErrorCount": -1,
        "DataErrorEscalationCount": 0,
        "TableErrorEscalationPolicy": "STOP_TASK",
        "RecoverableErrorInterval": 5,
        "ApplyErrorDeletePolicy": "IGNORE_RECORD",  --- can be set to STOP_TASK
        "TableErrorEscalationCount": 0,
        "FullLoadIgnoreConflicts": true,
        "DataErrorPolicy": "LOG_ERROR",
        "TableErrorPolicy": "SUSPEND_TABLE"
    },

Related information

Error handling task settings

Troubleshooting migration tasks in AWS Database Migration Service

Topics

Migration & Modernization

Relevant content

DMS Migration Task is Not Creating Cdc-Files when source endpoint is S3
Accepted Answer
Ross Bush
asked a year ago
DMS task fails when Postgres DateStyle parameter is not ISO
Accepted Answer
aimeezuha
asked 2 years ago
Automate a Data Migration Service (DMS) Task
raja
asked a year ago
AWS DMS task is failing on source endpoint timeout
Denys
asked 2 years ago
What is the difference between DMS endpoint settings and DMS extra connection attributes? when to use what?
AWS-User-5132967
asked 5 months ago
Why is my AWS DMS task in an error status?
AWS OFFICIALUpdated 2 years ago
Why is no data migrated from my Amazon S3 source endpoint even though my AWS DMS task is successful?
AWS OFFICIALUpdated 2 years ago
When should I restart and resume my AWS DMS task that is in a Stopped or Failed status?
AWS OFFICIALUpdated a year ago
Why is my AWS DMS replication DB instance in the storage-full status?
AWS OFFICIALUpdated 7 months ago
Migrate from MySQL running on VMware Cloud on AWS to Amazon Aurora using AWS Database Migration Service (DMS)
EXPERT
Koichi Takeda
published 6 months ago