What are the best practices to follow when migrating a source RDS MySQL database to a target RDS MySQL database using AWS DMS?

Last updated: 2022-08-19

I have a MySQL database that I want to migrate to an Amazon Relational Database Service (Amazon RDS) for MySQL database using AWS Database Migration Service (AWS DMS). What best practices can I use to optimize migration between a source MySQL database and a target MySQL database?

Short description

Using AWS DMS, you can migrate data from a source data store to a target data store. These two data stores are called endpoints. You can migrate between source and target endpoints that use the same database engine, such as from one MySQL database to another MySQL database.

Although AWS DMS creates the target schema objects, it creates only the minimum objects that it needs for data to be effectively migrated from the source. So, AWS DMS creates tables, primary keys, and in some cases unique indexes, but it doesn't create objects such as secondary indexes, non-primary key constraints, and data defaults. For more information on what AWS DMS migrates, see High-level view of AWS DMS.

Pre-create the tables on the target database before migration

To preserve the default data definitions, pre-create the tables on the target database before migration. Use one of these approaches, depending on the type of migration you're performing:

  • For homogeneous migrations such as MySQL to MySQL, use the native database engine utilities such as mysqldump to take an export of the table definitions. Then, import these table definitions into the target without the data. After the table definitions are created, use the AWS DMS task with target table preparation mode set to TRUNCATE_BEFORE_LOAD to load the data.
  • For migrations across different database engines, use the AWS Schema Conversion Tool (AWS SCT). You can also use this method for homogeneous databases. The AWS SCT connects to your source and target databases, and then converts the existing database schema from one database engine to another. You can use the AWS SCT to pre-create tables on the target database with the default data definitions intact. Then, use the AWS DMS task with the target table preparation mode set to TRUNCATE_BEFORE_LOAD to load the data. For more information, see Converting your schema by using AWS SCT.

Resolution

Follow the best practices for MySQL to MySQL AWS DMS migration

Use these best practices when you migrate data from a MySQL source database to a MySQL target database.

  • Turn off backups and database-specific logs (such as bin, general, and audit) on the target database during migration. If you need to, you can turn them on again to troubleshoot issues.
  • Turn off triggers, other cron jobs, and event schedulers on the target database during migration.
  • Avoid using Multi-AZ on the target Amazon RDS database when performing AWS DMS migration.
  • Avoid applying any other external client traffic to the target database when performing migration.
  • Provision your AWS DMS replication instance, source, and the target databases with the required CPU, memory, storage, and IOPS to avoid resource contention during migration.
  • Configure the source database with the prerequisites for using AWS DMS change data capture (CDC) before you start migration.
  • Use optimized LOB settings such as limited LOB and inline LOB for migration.
  • If the source database contains many tables with a heavy workload, split the tables among multiple tasks. Split the tables based on their size on the source database, the application traffics pattern, and the presence of LOB columns. If the table has many LOB (TEXT or JSON) columns with high write traffics on the source, then create a separate task for the table. Transactional consistency is maintained within a task, so it's important that tables in separate tasks don't participate in common transactions.
  • Use the parallel full load mechanism for heavy source tables to reduce migration time. For more information, see Using parallel load for selected tables, views, and collections.
  • Turn off foreign key constraints on the target table during full load migration.
  • Add secondary index on the target database before starting the CDC phase of replication.
  • The Amazon RDS primary user doesn't have drop and recreate privileges on the default schema tables. So, avoid migrating default database or schema tables from the source using AWS DMS.
  • Review the documentation on Using AWS DMS to migrate data from MySQL to MySQL for information on which types of data that AWS DMS can migrate successfully.
  • Test your workload using the default transactional CDC apply before you use the batch apply CDC method. For more information, see How can I use the DMS batch apply feature to improve CDC replication performance?
  • Test the migration using the same production data on any other QA/DEV database environment before starting the production migration. Make sure to use the same AWS DMS configuration when you do the production migration.

For more information, see Improving the performance of an AWS DMS migration.

1. Pre-create the table DDL on the target MySQL/PostgreSQL databases manually. Then, create an AWS DMS task with target preparation mode set to DO_DOTHING”/"TRUNCATE to migrate only the data.

Run this command to create a dump without data from the source MySQL database:

mysqldump -h yourhostnameorIP -u root -p --no-data --skip-triggers --single-transaction --dbname > schema.sql

This command dumps the DDL structure from the source without any data.

Next, run this command to restore the DDL structure on the target:

mysql -u user -p -h yourhostnameorIP  database_name < schema.sql

Or, you can allow AWS DMS to create the tables on the target using the DROP AND CREATE target preparation mode. Then, skip to step 3 to alter the table and add missing objects such as secondary indexes before you resume the task for the CDC phase.

Note: By default, AWS DMS creates the table on the target with only the primary key or the unique key. It doesn't migrate any other objects to the target MySQL database.

2. During the full load, AWS DMS doesn't identify the foreign key relational tables. It loads the data randomly, so the table load can fail if the target database has a foreign key check turned on. Use this extra connection attribute (ECA) on the target MySQL endpoint to turn off foreign key checks for this AWS DMS session.

initstmt=SET FOREIGN_KEY_CHECKS=0;

For more information, see Extra connection attributes when using a MySQL-compatible database as a target for AWS DMS.

3. In the JSON settings, set stop task before applying cached changes to true, and stop task after applying cached changes to true.

"FullLoadSettings": {
    "TargetTablePrepMode": "TRUNCATE_BEFORE_LOAD"
    "CreatePkAfterFullLoad": false,
    "TransactionConsistencyTimeout": 600,
    "MaxFullLoadSubTasks": 8,
    "StopTaskCachedChangesNotApplied": true,   <--- set this to true
    "StopTaskCachedChangesApplied": true,    <--- set this to true
    "CommitRate": 50000,
}

After full load is complete, and before it applies cached changes, the task stops. While the task is stopped, create primary key indexes and secondary indexes on the target.

Next, resume the task because the task stops again after it applies cached changes. Then, verify the migrated data by using AWS DMS validation output or manual verification before resuming the task again for the CDC replication phase. By completing this step, you can identify any issues and address them before resuming for CDC replication.

4.    In the Task full load settings, tune the commitRate setting to speed up the data extraction rate from the source. The default value is 10000, so tune this setting when you are migrating a large amount of data from the source table.

CommitRate=50000

Note: Changing commitRate to a higher value could affect performance, so make sure to monitor and have enough memory in the replication instance.

5.    Add this ECA on the target endpoint to specify the maximum size (in KB) of any .csv file used to transfer data to the target MySQL. The default value is 32,768 KB (32 MB), and valid values range from 1–1,048,576 KB (up to 1.1 GB).

maxFileSize=250000;

Note: When you use a target instance such as MySQL, Aurora or MariaDB for full load, use this option to allow AWS DMS to create a .csv file in the background to load the data into the target instance. Use a value between 32 MB and 1 GB. But, also consider how much your target instance can handle. If you have multiple tasks loading 1 GB of .csv file, this can cause overhead on your target instance. Make sure that you have an instance with high computing power at the target.

6.    Use the limited LOB or inline LOB settings for better performance.

Limited LOB mode: When using Limited LOB mode, you specify the maximum size of LOB column data. This allows AWS DMS to pre-allocate resources, and then apply LOB in bulk. If the size of the LOB columns exceeds the size that you specified in the task, AWS DMS truncates the data. AWS DMS then sends warnings to the AWS DMS log file. Using Limited LOB mode improves performance—however, before you run the task, you must identify the maximum LOB size of the data on the source. Then, specify the Max LOB size parameter. It's a best practice to make sure that you have enough memory allocated to the replication instance to handle the task.

Inline LOB mode: When using Inline LOB mode, you can migrate LOBs without truncating data or slowing your task performance, by replicating both small and large LOBs. First, specify a value for the InlineLobMaxSize parameter, which is available only when Full LOB mode is set to true. The AWS DMS task transfers the small LOBs inline, which is more efficient. Then, AWS DMS migrates LOBs that are larger than the specified size in Full LOB mode by performing a lookup from the source table. Note, however, that Inline LOB mode works only during the full load phase.

Note: You must set InlineLobMaxSize when you specify the task settings for your task.

Run these queries to check the LOB size, and then populate the Max LOB size.

List the tables that have LOB columns:

select tab.table_name,
count(*) as columns
from information_schema.tables as tab
inner join information_schema.columns as col
on col.table_schema = tab.table_schema
and col.table_name = tab.table_name
and col.data_type in ('blob', 'mediumblob', 'longblob',
'text', 'mediumtext', 'longtext')
where tab.table_schema = 'your database name'.  <---- enter database name here
and tab.table_type = 'BASE TABLE'
group by tab.table_name
order by tab.table_name;

Check the size of the LOB column:

Select (max(length (<COL_NAME>))/(1024)) as “size in KB” from <TABLE_NAME>;

Check the size of the LOB columns for all tables, and then populate the maximum size in Max LOB size (K).

Using use the Max LOB size (K) option with a value greater than 63 KB affects the performance of a full load configured to run in limited LOB mode. During a full load, AWS DMS allocates memory by multiplying the Max LOB size (k) value by the commit rate. Then, the product is multiplied by the number of LOB columns.

When AWS DMS can't pre-allocate that memory, AWS DMS starts consuming SWAP memory. This affects the performance of a full load. So, if you experience performance issues when using limited LOB mode, decrease the commit rate until you achieve an acceptable level of performance. Or, consider using inline LOB mode for supported endpoints after first checking the LOB distribution for the table.

For more information, see Setting LOB support for source databases in an AWS DMS task.


Did this article help?


Do you need billing or technical support?