AWS Database Blog

Back up an AWS DMS task using the AWS CLI and a JSON processor

In this post, we demonstrate how to use the AWS Command Line Interface (AWS CLI) and jq (a JSON processor) to back up AWS Database Migration Service (AWS DMS) task settings and other artifacts to a file. This backup file could be modified (if required) and reused to setup a new AWS DMS task.

One of the cost dimensions of AWS DMS is the underlying Amazon Elastic Compute Cloud (Amazon EC2) instance hours. If you are migrating data at specific times, such as monthly, quarterly, or yearly, the replication instance could be idle when the task isn’t running but still incurring charges.

A migration may require adjusting task settings or table mapping with custom rules to add, remove, or change data columns. Therefore, it’s helpful to back up the AWS DMS task for future use and reuse the artifacts whenever required.

Solution overview

You can create a backup of your AWS DMS task in a JSON file, which you could then use to create a new AWS DMS task. This solution also includes a way to resume a full load and CDC or CDC-only AWS DMS task from the same checkpoint.

The following diagram illustrates the workflows to back up and restore the task.

Workflow to backup and restore an AWS DMS Task

The high-level steps to back up an AWS DMS task are:

  1. Script – script out AWS DMS task using AWS CLI and JQ processor.
  2. Transform – remove unneeded AWS DMS task output details and transform the JSON to be ready for creating a new AWS DMS task.
  3. Save – finally save the newly generated json to a file.

Similarly high-level steps to restore an AWS DMS task are:

  1. Restore Task – restore AWS DMS task using backup task JSON file.
  2. Start – AWS DMS task is ready to be started.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Note: You can install AWS CLI and JQ tool on your local machine or an AWS EC2 server.

Retrieve AWS DMS task settings

To retrieve your AWS DMS replication task settings, run the describe-replication-tasks command using the AWS CLI and use the --filters flag. When you use this flag, AWS CLI returns only the AWS DMS task that matches one of these items:

  • The replication task ID
  • The replication task ARN

Run the following query to get the settings for your AWS DMS task using the replication task ID. Replace <dms_task_id> with your AWS DMS task ID.

aws dms describe-replication-tasks --filter Name=replication-task-id,Values=<dms_task_id> --output json

Or run the following query to get the settings for your AWS DMS task using the replication task ARN. Replace <dms_task_arn> with your AWS DMS task ARN.

aws dms describe-replication-tasks --filter Name=replication-task-arn,Values=<dms_task_arn> --output json

Filter and transform the JSON output using jq

Next, you filter and transform the JSON output from the previous step. You need to filter and transform this JSON in order to restore it at a later time.

  1. Use the jq command line JSON processor to make these modifications to the output JSON:
    • Add the key CdcStartPosition, and then set it to RecoveryCheckpoint. (This is only relevant when the task has CDC.)
      Note: even though we are transforming RecoveryCheckpoint to CdcStartPosition for a CDC task it is dependent on availability of that transaction in source DB transaction log.
    • Remove the following data:
      • ReplicationTaskArn
      • ReplicationTaskStartDate
      • ReplicationTaskStats
      • Status
      • StopReason
      • ReplicationTaskCreationDate
      • CloudWatchLogGroup
      • CloudWatchLogStream
      • RecoveryCheckpoint
  2. Run one of the following commands to pipe the output JSON to jq, and then filter and transform the replication task data:
    • Use the following code with the AWS DMS replication task ID:
aws dms describe-replication-tasks --filter Name=replication-task-id,Values=<dms_task_id> --output json | jq '.ReplicationTasks | .[] | .TableMappings |= fromjson | .ReplicationTaskSettings |= fromjson | .["CdcStartPosition"] = .RecoveryCheckpoint | delpaths([["ReplicationTaskArn"],["ReplicationTaskStartDate"],["ReplicationTaskStats"],["Status"],["StopReason"],["ReplicationTaskCreationDate"],["ReplicationTaskSettings","Logging","CloudWatchLogGroup"],["ReplicationTaskSettings","Logging","CloudWatchLogStream"],["RecoveryCheckpoint"]]) | .TableMappings |= tostring |.ReplicationTaskSettings |= tostring'
    • Use the following code with the AWS DMS replication task ARN:
aws dms describe-replication-tasks --filter Name=replication-task-arn,Values=<dms-task-arn> --output json | jq '.ReplicationTasks | .[] | .TableMappings |= fromjson | .ReplicationTaskSettings |= fromjson | .["CdcStartPosition"] = .RecoveryCheckpoint | delpaths([["ReplicationTaskArn"],["ReplicationTaskStartDate"],["ReplicationTaskStats"],["Status"],["StopReason"],["ReplicationTaskCreationDate"],["ReplicationTaskSettings","Logging","CloudWatchLogGroup"],["ReplicationTaskSettings","Logging","CloudWatchLogStream"],["RecoveryCheckpoint"]]) | .TableMappings |= tostring |.ReplicationTaskSettings |= tostring'

Save the output to a JSON file

To save the resulting output as a backup, redirect it to a JSON file. For example, the following command saves the task settings for an AWS DMS task with the task ID dms_original_task to a JSON file named task_backup.json:

aws dms describe-replication-tasks --filter Name=replication-task-id,Values=original-dms-task --output json | jq '.ReplicationTasks | .[] | .TableMappings |= fromjson | .ReplicationTaskSettings |= fromjson | .["CdcStartPosition"] = .RecoveryCheckpoint | delpaths([ ["ReplicationTaskArn"],["ReplicationTaskStartDate"],["ReplicationTaskStats"],["Status"],["StopReason"],["ReplicationTaskCreationDate"],["ReplicationTaskSettings","Logging","CloudWatchLogGroup"],["ReplicationTaskSettings","Logging","CloudWatchLogStream"],["RecoveryCheckpoint"]]) | .TableMappings |= tostring |.ReplicationTaskSettings |= tostring' > task_backup.json

Or using ARN

aws dms describe-replication-tasks --filter Name=replication-task-arn,Values= arn:aws:dms:us-east-1:xxxxxxxxxxxx:task:QABCDEFGHIJK12LM34NO56PQKCUDBNTXD2ZZBVI --output json | jq '.ReplicationTasks | .[] | .TableMappings |= fromjson | .ReplicationTaskSettings |= fromjson | .["CdcStartPosition"] = .RecoveryCheckpoint | delpaths([ ["ReplicationTaskArn"],["ReplicationTaskStartDate"],["ReplicationTaskStats"],["Status"],["StopReason"],["ReplicationTaskCreationDate"],["ReplicationTaskSettings","Logging","CloudWatchLogGroup"],["ReplicationTaskSettings","Logging","CloudWatchLogStream"],["RecoveryCheckpoint"]]) | .TableMappings |= tostring |.ReplicationTaskSettings |= tostring' > task_backup.json

The resulting JSON file shows an output similar to the following:

{
  "SourceEndpointArn": "arn:aws:dms:us-east-1:xxxxxxxxxxxx:endpoint:QKMQN2TCULCPYIK3CMGF2CRDC5DIWZ433BCKHZY",
  "ReplicationTaskIdentifier": "original-dms-task",
  "ReplicationInstanceArn": "arn:aws:dms:us-east-1:xxxxxxxxxxxx:rep:C4CAB7ANKQU2AU7WFH4BYCJSLPS3YYUGCTHL3UA",
  "TableMappings": "{\"rules\":[{\"rule-type\":\"selection\",\"rule-id\":\"293111713\",\"rule-name\":\"293111713\",\"object-locator\":{\"schema-name\":\"%\",\"table-name\":\"%\"},\"rule-action\":\"include\",\"filters\":[]}]}",
  "MigrationType": "full-load-and-cdc",
  "TargetEndpointArn": "arn:aws:dms:us-east-1:xxxxxxxxxxxx:endpoint:6LEFAZCDPQD2HWHJIXJYUU4PGMTKA2AKKFN4KCA",
  "ReplicationTaskSettings": "{\"Logging\":{\"EnableLogging\":true,\"LogComponents\":[{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"TRANSFORMATION\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"SOURCE_UNLOAD\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"IO\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"TARGET_LOAD\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"PERFORMANCE\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"SOURCE_CAPTURE\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"SORTER\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"REST_SERVER\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"VALIDATOR_EXT\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"TARGET_APPLY\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"TASK_MANAGER\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"TABLES_MANAGER\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"METADATA_MANAGER\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"FILE_FACTORY\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"COMMON\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"ADDONS\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"DATA_STRUCTURE\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"COMMUNICATION\"},{\"Severity\":\"LOGGER_SEVERITY_DEFAULT\",\"Id\":\"FILE_TRANSFER\"}]},\"StreamBufferSettings\":{\"StreamBufferCount\":3,\"CtrlStreamBufferSizeInMB\":5,\"StreamBufferSizeInMB\":8},\"ErrorBehavior\":{\"FailOnNoTablesCaptured\":true,\"ApplyErrorUpdatePolicy\":\"LOG_ERROR\",\"FailOnTransactionConsistencyBreached\":false,\"RecoverableErrorThrottlingMax\":1800,\"DataErrorEscalationPolicy\":\"SUSPEND_TABLE\",\"ApplyErrorEscalationCount\":0,\"RecoverableErrorStopRetryAfterThrottlingMax\":true,\"RecoverableErrorThrottling\":true,\"ApplyErrorFailOnTruncationDdl\":false,\"DataTruncationErrorPolicy\":\"LOG_ERROR\",\"ApplyErrorInsertPolicy\":\"LOG_ERROR\",\"EventErrorPolicy\":\"IGNORE\",\"ApplyErrorEscalationPolicy\":\"LOG_ERROR\",\"RecoverableErrorCount\":-1,\"DataErrorEscalationCount\":0,\"TableErrorEscalationPolicy\":\"STOP_TASK\",\"RecoverableErrorInterval\":5,\"ApplyErrorDeletePolicy\":\"IGNORE_RECORD\",\"TableErrorEscalationCount\":0,\"FullLoadIgnoreConflicts\":true,\"DataErrorPolicy\":\"LOG_ERROR\",\"TableErrorPolicy\":\"SUSPEND_TABLE\"},\"ValidationSettings\":{\"ValidationPartialLobSize\":0,\"PartitionSize\":10000,\"RecordFailureDelayLimitInMinutes\":0,\"SkipLobColumns\":false,\"FailureMaxCount\":10000,\"HandleCollationDiff\":false,\"ValidationQueryCdcDelaySeconds\":0,\"ValidationMode\":\"ROW_LEVEL\",\"TableFailureMaxCount\":1000,\"RecordFailureDelayInMinutes\":5,\"MaxKeyColumnSize\":8096,\"EnableValidation\":true,\"ThreadCount\":5,\"RecordSuspendDelayInMinutes\":30,\"ValidationOnly\":false},\"TTSettings\":{\"TTS3Settings\":null,\"TTRecordSettings\":null,\"EnableTT\":false},\"FullLoadSettings\":{\"CommitRate\":10000,\"StopTaskCachedChangesApplied\":false,\"StopTaskCachedChangesNotApplied\":false,\"MaxFullLoadSubTasks\":8,\"TransactionConsistencyTimeout\":600,\"CreatePkAfterFullLoad\":false,\"TargetTablePrepMode\":\"DROP_AND_CREATE\"},\"TargetMetadata\":{\"ParallelApplyBufferSize\":0,\"ParallelApplyQueuesPerThread\":0,\"ParallelApplyThreads\":0,\"TargetSchema\":\"\",\"InlineLobMaxSize\":0,\"ParallelLoadQueuesPerThread\":0,\"SupportLobs\":true,\"LobChunkSize\":0,\"TaskRecoveryTableEnabled\":false,\"ParallelLoadThreads\":0,\"LobMaxSize\":32,\"BatchApplyEnabled\":false,\"FullLobMode\":false,\"LimitedSizeLobMode\":true,\"LoadMaxFileSize\":0,\"ParallelLoadBufferSize\":0},\"BeforeImageSettings\":null,\"ControlTablesSettings\":{\"historyTimeslotInMinutes\":5,\"HistoryTimeslotInMinutes\":5,\"StatusTableEnabled\":false,\"SuspendedTablesTableEnabled\":false,\"HistoryTableEnabled\":false,\"ControlSchema\":\"\",\"FullLoadExceptionTableEnabled\":false},\"LoopbackPreventionSettings\":null,\"CharacterSetSettings\":null,\"FailTaskWhenCleanTaskResourceFailed\":false,\"ChangeProcessingTuning\":{\"StatementCacheSize\":50,\"CommitTimeout\":1,\"BatchApplyPreserveTransaction\":true,\"BatchApplyTimeoutMin\":1,\"BatchSplitSize\":0,\"BatchApplyTimeoutMax\":30,\"MinTransactionSize\":1000,\"MemoryKeepTime\":60,\"BatchApplyMemoryLimit\":500,\"MemoryLimitTotal\":1024},\"ChangeProcessingDdlHandlingPolicy\":{\"HandleSourceTableDropped\":true,\"HandleSourceTableTruncated\":true,\"HandleSourceTableAltered\":true},\"PostProcessingRules\":null}",
  "CdcStartPosition": "checkpoint:V1#102#00000000/B803EBA0#0#0#*#0#277"
}

Restore the AWS DMS task

You can now edit the command below as needed to change the AWS DMS task identifier and restore the JSON file using the create-replication-task AWS CLI command. For example, the following command creates an AWS DMS replication task with the task ID restored-dms-task using the task_backup.json file:

aws dms create-replication-task --replication-task-identifier restored-dms-task --cli-input-json file://task_backup.json

Note that when you include --replication-task-identifier in the command, you override the --replication-task-identifier value in the task_backup.json file.

Similarly, if you have full load and cdc task and you are looking to start a cdc only task, you can override migration-type to cdc. Please note that the source database must have transactions referred by cdc-start-position or else the task will fail.

aws dms create-replication-task --replication-task-identifier restored-dms-task --migration-type cdc --cli-input-json file://task_backup.json

Conclusion

In this post, we demonstrated how to back up an existing AWS DMS task to a JSON file using the AWS CLI and JQ and restore the JSON file to a new AWS DMS task as and when needed.

We encourage you to try out this solution. If you have any feedback or questions, leave a comment in the comments section.


About the Authors

Subhash Raut is a Database Migration Specialist Solutions Architect at AWS. He works on challenges related to database migrations and works closely with customers to help them realize the true potential of AWS DMS. He has helped customers to migrate databases to the AWS Cloud by using AWS DMS and the AWS SCT.

Brian Oliver is a Cloud Support Engineer at AWS. He works with customers to provide guidance and technical assistance on database projects, helping them to improve the value of their solutions when using AWS.