When should I restart and resume my AWS DMS task that is in a Stopped or Failed status?

Last updated: 2022-11-22

I have an AWS Database Migration Service (AWS DMS) task that is in the Stopped or Failed state. When should I resume or restart my AWS DMS task to continue replication?

Short description

When your AWS DMS task is in a Stopped or Failed status, you have two options to allow you to continue replication:

  • Resume - When you resume a task, AWS DMS continues replication from the last point before the task stopped or failed.
  • Restart - When you restart a task, AWS DMS begins replication from the start, and uses the table preparation mode that you chose when you created the task. For example, table preparation modes include Drop table on target, Truncate, and Do nothing. For more information, see Full-load task settings.

The behavior of the resume and restart actions varies based on which of the three AWS DMS migration methods that you're using. These methods include full load, full load and change data capture (CDC), and CDC only migration. For more information, see Creating a task.

Resolution

Migrate existing data (full load)

For full load tasks, if you start the task manually during task creation, then use the restart action in the AWS DMS console to start replication. This reloads all of the tables in the migration.

You can also use the reload table data option to reload specific tables that failed during migration. This means that tables that are already loaded don't need to load again, and any tables that didn't finish loading are loaded again.

If you use the resume option while migrating multiple tables, then tables that didn't fully complete starts over. Tables that completed migration aren't affected. If you're migrating a single table, it's a best practice to restart the task rather than using the resume option.

Migrate existing data and replication ongoing changes (full load and CDC)

When you use a full load and CDC task, AWS DMS migrates table data, and then applies data changes that occur on the source. If you restart the task, this loads all the tables again, and starts capturing source changes from the restart time. If your task is configured with the Do nothing preparation mode, then manually empty target tables before restarting the task.

If you resume the task, then only changes that were captured after the last stop point are applied to the database. If the migration task stops during the CDC phase, then AWS DMS maintains the checkpoint information for future use. You can view the task checkpoint in the Overview details tab of the AWS DMS console.

Replicate data changes only (CDC only)

If you're using a CDC only task, you can start capturing source data changes from either the current time or from the CDC start point. If you restart the task when a CDC start point is defined, then the reload operation loads all changes from this point in time. If you restart a task without a CDC start point, then CDC changes from when the task was stopped until it was restarted are lost.

The resume operation continues replicating changes from the last stop point, regardless of the CDC start point configuration. If you restart a task with a Truncate target table preparation mode, then AWS DMS leaves existing target tables and their metadata in place. But, it deletes all existing data from these tables before it restarts migration.

Common scenarios for resuming or restarting an AWS DMS task

Here are some common examples of when you can use the resume or restart operations on your AWS DMS task.

Restart the AWS DMS task:

  • If the source database management system (DBMS) doesn't contain the recovery log files or transaction log files to resume the CDC process, restart the task. Restarting the task loads all table data and continues capturing changes.
  • If an AWS DMS task is in an Error status, this means that one or more of the tables in the task couldn't be migrated. In an Error status, the task continues to load other tables from the selection rule, but a task with a Failed status stops with fatal errors. After you resolve the errors, reload the tables, or restart the task to resolve the error status. For more information, see Why is my AWS DMS task in an error status?
  • If a full load and CDC or a CDC only task is stopped, then the data changes can spill over from memory to disk. Depending on the volume of change data swapped into disk, a task resume might take a long time to continue replication. This is because AWS DMS takes a longer time to read these changes from disk. So, if it's feasible, restart the task to avoid this wait time.
  • If you change between using Oracle LogMiner and AWS DMS Binary Reader, then make sure to restart the CDC task.
    Note: After modifying the CDC method, if you restart a CDC-only task that is configured with CDC recovery checkpoint, you might see an error similar to:
    "[SOURCE_CAPTURE ]D: Invalid context provided for the Binary Reader based CDC. Restart task is required."
    To resolve this error, start the task based on timestamp in the CDC start point.

Resume the AWS DMS task:

  • If you move a task to a new replication instance, resume the task to continue replicating changes from the point it was last stopped.
  • If you want to upgrade your source or target databases, then stop any AWS DMS tasks that are running on these databases. Resume your tasks after your upgrades are complete.
  • If you plan to upgrade your source or target databases, then stop any AWS DMS tasks that are running on these databases. Resume the tasks after your upgrades are complete. But, to perform a PostgreSQL engine version upgrade, you can't have any replication slots on the instance. So, drop any replication slots before you upgrade your engine, and then restart the task to recreate the replication slot.