AWS Database Blog

New features in AWS DMS 3.5.0

We are excited to announce the availability of AWS Database Migration Service (AWS DMS) replication engine version 3.5.0. This version provides improvements to task logging, new data type support, and AWS service integrations that were requested by many of our customers. For the entire list of improvements, refer to AWS Database Migration Service 3.5.0 Beta release notes. In this post, we highlight a few key features:

  • Amazon Simple Storage Service (Amazon S3) target data validation
  • Oracle extended data type
  • Time Travel for SQL Server and Oracle
  • XA transaction support for MySQL sources
  • AWS DMS context logging

Amazon S3 target data validation

Amazon S3 is one of the most commonly used targets for AWS DMS for a variety of use cases, such as replicating data from OLTP databases for archival and retention, hydrating data lakes, and more. Before AWS DMS 3.5.0, you had limited options to validate data between your target S3 bucket and your source database. Now, we’re excited to share that if you’re migrating to Parquet files on Amazon S3, AWS DMS supports data validation.

Let’s walk through the steps you can take to enable data validation for an Amazon S3 target.

Add IAM permissions

Add the following AWS Identity and Access Management (IAM) permissions to the Amazon S3 service access role used in your Amazon S3 target endpoint:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "athena:GetQueryExecution",
                "athena:CreateWorkGroup"
            ],
            "Resource": "arn:aws:athena:<region>:<account_id>:workgroup/dms_validation_workgroup_for_task_*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "glue:CreateDatabase",
                "glue:DeleteDatabase",
                "glue:GetDatabase",
                "glue:GetTables",
                "glue:CreateTable",
                "glue:DeleteTable",
                "glue:GetTable"
            ],
            "Resource": [
                "arn:aws:glue:<region>:<account_id>:catalog",
                "arn:aws:glue:<region>:<account_id>:database/aws_dms_s3_validation_*",
                "arn:aws:glue:<region>:<account_id>:table/aws_dms_s3_validation_*/*",
                "arn:aws:glue:<region>:<account_id>:userDefinedFunction/aws_dms_s3_validation_*/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucketMultipartUploads",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::<bucket_name>",
                "arn:aws:s3:::<bucket_name>/*"
            ]
        }
    ]
}

Configure endpoint settings

Create (or modify) your target Amazon S3 endpoint to include the settings shown in the following screenshot. For more details on these settings, refer to Amazon S3 target data validation.

Create an AWS DMS task

Create a task with the target Amazon S3 endpoint that you created or modified previously and make sure to select Enable validation.

After the task completes the full load, you can review the validation status for the tables via the table statistics on the AWS DMS task console.

Refer to Amazon S3 target data validation for more details.

Oracle extended data types

Beginning with Oracle Database 12c, Oracle started supporting extended data types, allowing a maximum size of 32,767 bytes for the VARCHAR2, NVARCHAR2, and RAW data types.

A VARCHAR2 or NVARCHAR2 data type with a declared size greater than 4,000 bytes, or a RAW data type with a declared size greater than 2,000 bytes, is considered an extended data type. Extended data type columns are stored out-of-line, using Oracle’s LOB technology.

Prior to AWS DMS 3.5.0, extended data types were handled inconsistently during replication. Now, with AWS DMS 3.5.0, you can migrate extended data types of various sizes, and AWS DMS will handle them based on the length of the data type as defined in the Oracle source database column. The following table describes how AWS DMS maps Oracle extended data types to PostgreSQL. See Targets for data migration for information about AWS DMS mappings for other engines.

Oracle Data Type AWS DMS Data Type PostgreSQL Data Type
VARCHAR2, when the length is 4,000 bytes or less STRING VARCHAR
VARCHAR2, when the length is greater than 4,000 bytes CLOB TEXT
NVARCHAR2, when the length is 4,000 bytes or less WSTRING VARCHAR
NVARCHAR2, when the length is greater than 4,000 bytes CLOB TEXT
RAW, when the length is 2,000 bytes or less BYTES BYTEA
RAW, when the length is greater than 2,000 bytes BLOB BYTEA

Because AWS DMS is treating extended data types as LOBs, consider the following LOB handling task settings while creating tasks to migrate extended data types:

  • LOB support should be enabled on the task level. Refer to Setting LOB support for source databases in an AWS DMS task for additional details.
  • For tasks using Limited LOB mode, data truncation may occur if the max LOB size setting is smaller than the largest extended data type size stored in the source database. By default, max LOB size is 32 KB, which should handle the maximum allowable length for extended data types appropriately.

Time Travel support for Oracle and SQL Server sources

Time Travel was introduced in AWS DMS 3.4.6 to simplify troubleshooting data discrepancies between the source and target by recording the events and data migrated by AWS DMS to the S3 bucket of your choice. With AWS DMS 3.5.0, we’re adding Oracle and SQL Server to the list of supported source endpoints for Time Travel.

For AWS DMS 3.5.0, you can use Time Travel if your source is Oracle, SQL Server, or PostgreSQL and your target is either MySQL or PostgreSQL.

Let’s walk through the steps you can take to use Time Travel for Oracle and SQL Server sources.

Configure prerequisites

Complete the following prerequisites:

  1. Identify or create the AWS Key Management Service (AWS KMS) key if you want to use AWS KMS encryption for your S3 bucket. For more details, refer to Time Travel task settings.
  2. Create an IAM role named dms-tt-s3-access-role with the following privileges and trust policy (update the bucket name and KMS key ID resource ARN):
    {
     "Version": "2012-10-17",
     "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "kms:GenerateDataKey",
                    "kms:Decrypt",
                    "s3:ListBucket",
                    "s3:DeleteObject"
                ],
                "Resource": [
                    "arn:aws:s3:::<S3_bucket_name>*",
                    "arn:aws:kms:<region>:<account_id>:key/:<key_id>"
                ]
            }
        ]
    }
    {
     "Version": "2012-10-17",
     "Statement": [
             {
                 "Effect": "Allow",
                 "Principal": {
                     "Service": [
                         "dms.amazonaws.com"
                     ]
                 },
                 "Action": "sts:AssumeRole"
            }
        ]
    }

Enable the Time Travel feature in AWS DMS task settings

Add the settings detailed in Turning on the Time Travel logs for a task while creating your task or by modifying it.

After the task is started, you will see Time Travel logs in the S3 bucket folder that you set up.

MySQL XA transaction support

Customers have requested Extended Architecture (XA) transaction support for MySQL and MariaDB sources. AWS DMS 3.5.0 introduces full support for XA transactions. You can use an XA transaction to group a series of operations from multiple transactional resources into a single reliable global transaction.

With AWS DMS versions prior to v3.5.0, after enabling detailed logging, you would have seen messages similar to the following example, indicating that XA events were unsupported. Those events could have resulted in missing data on the target.

[SOURCE_CAPTURE  ]T:  >>> Unsupported or commented out DDL: 'XA END X'36383432323764622d646564392d343864312d386337652d3165646663666665636666313a3937343235',X'3937343235',1'  (mysql_endpoint_capture.c:1762)

Make sure you meet the following prerequisites to support XA transactions with AWS DMS 3.5.0:

  1. Set the endpoint setting IgnoreOpenXaTransactionsCheck to false on the source endpoint.
  2. To allow AWS DMS to perform this validation, AWS DMS requires the following permission for the AWS DMS user connecting to the source endpoint:
grant XA_RECOVER_ADMIN on *.* to 'userName'@'%';

AWS DMS context logging

AWS DMS 3.5.0 includes a new feature called context log labeling, which enhances the task’s logging capabilities. This feature adds context information to the task logs periodically and is controlled via the EnableLogContext task setting. By default, context logging is enabled for a task, runs at a 3-minute interval, and is available in both full load and ongoing replication phases. You have the option to enable or disable context logging when creating or modifying your migration task.

Because the run interval is set to 3 minutes, the context log labeling feature doesn’t increase the task log size significantly. There are different types of context logging messages (see Logging Examples for more details), but for most context logging messages, the following information is captured:

  • TABLE_NAME
  • SCHEMA_NAME
  • TRANSACTION_ID
  • CONNECTION_ID
  • STATEMENT
  • STREAM_POSITION

The following are common use cases for AWS DMS context logging:

  • Context logging provides the database connection ID in the task’s Amazon CloudWatch logs (refer to Viewing and managing AWS DMS task logs), such as the SID for Oracle and connection ID for MySQL. This can be used to investigate stuck or slow-running queries on the source or target endpoint.
  • Context logging provides the transaction log position (STREAM_POSITION), which can be used to identify replication progress. If you don’t see any change in stream position, it indicates that the task is not replicating any data.
  • For error or warning log messages, context logging provides detailed information about the event, which can be helpful during troubleshooting.
  • Context logging provides helpful insights into full load and change data capture (CDC) AWS DMS processing, such as SQL statements. This information can be used for easier troubleshooting or a better understanding of AWS DMS functionality.

Refer to AWS DMS Context logging for more information.

Context logging messages can be viewed in the task’s CloudWatch logs. For example, the following log line indicates that the task’s source capture component read a transaction with ID 38482952164774 from the source binary log mysql-bin-changelog.008960 at position 45192614 using thread ID 9083:

03171744: 2023-01-05T08:19:35 [SOURCE_CAPTURE ]I: Capture record 1210943 to internal queue from Source {operation:BEGIN (6), txnId:38482952164774, connectionId:9083, streamPosition:mysql-bin-changelog.008960:45192614:-1:45192693:38482952164774:mysql-bin-changelog.008960:45192346} (streamcomponent.c:2916)

The following log line indicates that the task applied the source transaction 38487234064670 from source binary log file mysql-bin-changelog.008961 at position 32124945 to the target:

03171745: 2023-01-05T08:23:18 [TARGET_APPLY ]I: Applied record 6124549 to target {operation:BEGIN (6), txnId:38487234064670, streamPosition:mysql-bin-changelog.008961:32125214:-1:32125293:38487234064670:mysql-bin-changelog.008961:32124945} (endpointshell.c:5107)

In addition to the above, context logging includes the following message when no events are received from the source, which helps troubleshoot issues with tasks that aren’t making progress:

03171744: 2023-01-05T07:52:18 [SOURCE_CAPTURE ]I: No Event fetched from binary log (mysql_endpoint_capture.c:4253)

03171745: 2023-01-05T08:10:20 [TARGET_APPLY ]I: No records received to load or apply on target , waiting for data from upstream. The last context is {operation:COMMIT (7), txnId:38478614859805, streamPosition:mysql-bin-changelog.008959:2855186:-1:2855213:38478614859805:mysql-bin-changelog.008959:2855020} (streamcomponent.c:1982)

Conclusion

In this post, we highlighted a few of the key features of AWS DMS 3.5.0 that were requested by our customers. With S3 target data validation, you can now validate data consistency for S3 targets without having to develop custom solutions. Support for Oracle extended data types and MySQL XA transactions allow you to migrate your workload without having to change the source database configuration. Using DMS context logging and Time Travel support for Oracle and SQL Server sources, troubleshooting migration issues is now easier. If you have any feedback or questions, please leave them in the comments.

Special thanks to Balaji Baskar and Aswin Sankarapillai for their contribution to this launch.


About the authors

Sridhar Ramasubramanian is a database engineer with the AWS Database Migration Service team. He works on improving the DMS service to better suit the needs of AWS customers.

A. Mosaad is a Senior Database Engineer in Database Migration Service at Amazon Web Services. He works with our customers to provide guidance and technical assistance on database migration projects, helping them improve the value of their solutions when using AWS.