Troubleshooting AWS DMS error "Last Error Replication task out of memory. Stop Reason FATAL_ERROR Error Level FATAL"

Last updated: 2023-01-09

When using AWS Database Migration Services (AWS DMS), I receive the error "Last Error Replication task out of memory. Stop Reason FATAL_ERROR Error Level FATAL".

Short description

When using AWS DMS, you receive the following error: "Last Error Replication task out of memory. Stop Reason FATAL_ERROR Error Level FATAL". To find the root cause of the error, review the AWS DMS task logs. For the preceding error, the logs provide the following information: “Task process for 'XXXXXXX' failed because it ran out of memory”.

To resolve this error, complete one or more of the following steps:

  • Change the task settings or memory-related parameters.
  • Scale up the replication instance class based on fluctuations in Amazon CloudWatch metrics for replication instances, such as FreeMemory and SwapUsage.
  • Split a single task into multiple tasks based on the size of data that's migrated and the amount of memory that's required for the task.

Resolution

Note: You must stop the task prior to making modifications. After you’ve made the modifications, you must resume the task. In-flight tables are reloaded from scratch if the task stops during the full load phase.

Check if you can scale the task settings or memory-related parameters that require higher memory capacity. The following are some of the most common task settings and parameters to check:

  • LOB settings
  • Validation parameters, such as ThreadCount and PartitionSize
  • Parallel thread parameters, such as ParallelLoadThreads, ParallelLoadBufferSize, ParallelLoadQueuesPerThread, ParallelApplyThreads, ParallelApplyBufferSize, and ParallelApplyQueuesPerThread.
  • Batch apply parameters, such as BatchApplyTimeoutMin, BatchApplyTimeoutMax, BatchApplyMemoryLimit, and BatchSplitSize.
  • Other memory-related task settings, such as MinTransactionSize, MemoryLimitTotal, MemoryKeepTime, and StatementCacheSize.

For more details on the preceding task settings and parameters, see How does AWS DMS use memory for migration?

Scale up replication instance class based on fluctuations in Amazon CloudWatch metrics

Check the replication instance’s FreeMemory and SwapUsage metrics. If FreeMemory decreases or SwapUsage either increases or fluctuates, then consider moving to a larger replication instance.

Also, consider using memory-optimized instances. Memory-optimized instances are suited for memory-intensive workloads, such as ongoing migrations and replications of high-throughput transactions. For more information on replication instance size and types, see Choosing the right AWS DMS replication instance for your migration.

Split a single task into multiple tasks based on the size of data that's migrated and the amount of memory that's required for the task

If the replication instance has multiple tasks, then you can use the DMS MemoryUsage metric to observe the amount of memory that the task consumes. To determine why the task is holding memory in the CDC phase, compare CDCChangesMemorySource and CDCChangesMemoryTarget, and then troubleshoot the respective endpoint.

When multiple tasks are running on the replication instance, do one or more of the following actions:

  • Reduce the number and type of tasks that are running on the replication instance.
  • Move the failed task to a different replication instance, and then try again.
  • Increase instance capacity.

For tasks that have multiple tables loading in parallel or many tables and schemas being migrated, do one or more of the following actions:

  • Reduce the number of tables that are loading in parallel.
  • Reduce the number of total tables and schemas being migrated.
  • Use a different task on a different replication instance to offload the migration of some of the tables and schemas.
  • Increase instance capacity.