How can I troubleshoot data mismatch issues between the source and target databases in AWS DMS?
Last updated: 2022-09-12
I have an AWS Database Migration Service (AWS DMS) task that is in the full-load complete or replication ongoing stage of migration. But, the data downstream in the target database does not match the data on the source. How can I troubleshoot data mismatch issues between the source and target databases?
There are a number of reasons that you might see data mismatch when you migrate your data in a heterogeneous environment using AWS DMS. For example:
- If you're not using the right LOB settings, then data is truncated as per the MaxLobSize task setting. So, the target LOB column doesn't contain the exact same data as the source.
- During heterogeneous migration, AWS DMS converts the source data type to an internal data type. Then, AWS DMS converts the internal data to the target data type. But, because some source and target data types are not fully supported by AWS DMS, you might see data mismatch between the source and target.
- If any errors or exceptions occur during change data capture (CDC) replication, then the DML transaction can fail at the target database.
Perform initial checks before troubleshooting
Before troubleshooting the issue using the steps in this article, perform these initial checks:
- If you are using a full-load-only task, make sure that the migration is complete, and that the task has come into a stopped state.
- If you are using a full-load- or CDC-only task, then check the CDCLatencySource and CDCLatencyTarget Amazon CloudWatch metrics. Confirm that you don't observe any latency.
- Confirm that there are no other applications connected to the source or target databases. This can cause data manipulation. For example, if you are running a full-load-only task, and the data at the source is modified by another application, then you might see data mismatch at the target. Or, if the target database has DML triggers or another application writes data to the migrated target table, you will also see data mismatch between target and source.
Query the awsdms_validation_failures_v1 table on the target
When data consistency is your main goal, be sure to turn on validation when you create the AWS DMS task.
Troubleshoot when validation is turned on
If validation is turned on, then you can check the awsdms_Validation_failures_v1 table in your target database. If any record enters the ValidationSuspended or ValidationFailed state during migration, then AWS DMS writes diagnostic information to awsdms_validation_failures_V1. Query this table to troubleshoot validation errors by running a command similar to this:
select * from awsdms_validation_failures_v1 where TASK_NAME = 'ABC123FGJASHKNA345';
Check the details column in the output to get information about the failure. Use the Key column to compare the record data between your source and target. For more information, see the Troubleshooting section of the AWS DMS data validation documentation.
Troubleshoot when validation is turned off
If you haven't turned on validation on your AWS DMS task, then create a validation-only task:
- For a one-time migration, use the full-load validation-only feature to quickly compare all rows between the source and the target.
- For ongoing replication, use the CDC validation-only task. A CDC validation-only task validates the existing rows between the source and target tables. The task continues with ongoing changes as they appear, and reports any data validation failures.
Check for limitations in the source and target data
After you identify the mismatched data, check the source and target for limitations associated with their data types. For example, when you use PostgreSQL as a source, you can't migrate ENUM data types.
Check for errors in the task log
Check the task log for errors at the time of validation failure. Or, check the control tables to see if any exceptions were logged during the data replication stage.
Resolve mismatched data caused by truncation
When you use Limited LOB mode, AWS DMS pre-allocates memory in the replication instance. Then, it loads the LOB data in bulk using the LoBMaxSize task setting. AWS DMS truncates any LOBs that exceed the maximum LOB size, and then issues a warning to the log file.
Check the log files for warning messages that indicate that data was truncated, and then check the corresponding LOB column's maximum length. Define a LobMaxSize that is bigger than the LOB column length so that data isn't truncated. Use the diagnostic support script to find out which tables have LOB data, and then query accordingly.
If the LOB column's maximum size is more than 100 MB, then use Full LOB or Inline LOB mode to stop LOB column data truncation.