AWS Database Migration Service improves migration speeds by adding support for parallel full load and new LOB migration mechanisms
This blog post was last reviewed and updated March, 2022.
We’re happy to introduce AWS DMS replication engine version 3.1.2. The new version includes a better UX and performance improvements that were requested by many customers. We listened and delivered on our promise to make DMS better. In this blog post, I’ll cover few key new features. For the entire list, see the AWS DMS release notes, which includes detailed information about features and bug fixes for this and previous versions of DMS.
DMS replication engine version 3.1.2 new features:
- UTF-8 4-byte character set support
- Improved full load performance for partitioned and sub partitioned tables
- Improved large objects (LOB) performance
- Table load order during full load
- Support for the update of primary key values on a PostgreSQL source
The summaries in this blog post include tests and examples that you can run on your own. To do this you’ll need the following AWS resources:
- AWS Account
- AWS Database Migration Service
- Oracle database as a source
- PostgreSQL database as a target
UTF-8 4-byte character set support
In previous versions of AWS DMS UTF-8 4-byte character sets were not supported. For example, U+FF83 ﾃ, U+FF8C ﾌ, U+FF9A ﾚ, U+00AE ®, U+20AC €, U+4E9A 亚, or U+F900 豈 would cause unexpected behavior during a migration. Migration tasks would fail with “invalid byte sequence” errors when 4-byte characters were encountered.
Here is an example of such an error.
With version 3.1.2, you can migrate 4-byte characters without any issue!
I’ll perform a quick test with DMS version 3.1.2 to show how you can migrate UTF-8 4-byte characters.
Run the following commands at source an Oracle database to create a table and insert records with 4-byte UTF-8 characters in your Oracle source database.
Create a DMS task to migrate the source table that contains utf8mb characters.
Run the following command to query the records on the target database table.
The following table shows records migrated with UTF-8 4-byte characters migrated to the target database table.
Improved full load performance for partitioned and sub-partitioned tables
AWS DMS version 3.1.2 now supports the parallel full load feature. During a full load migration task, you can accelerate the migration of large or partitioned tables by splitting the table into segments and loading the segments in-parallel in the same migration task.
To use parallel loading, create a rule of type table-settings with the parallel-load option. Within the table-settings rule, specify the selection criteria for the table or tables you want to load in-parallel. To specify the selection criteria, set the type element for parallel-load to be one of the following. The AWS DMS documentation includes detailed information about these settings.
The task must be defined with a combination of the following source and target endpoints.
Source endpoints: Oracle, Microsoft SQL Server, MySQL, PostgreSQL, IBM Db2 for LUW, SAP Sybase ASE
Target endpoints: Oracle, Microsoft SQL Server, PostgreSQL, Sybase ASE, MySQL
Setting up parallel load example
I’ll use the examples below to perform a parallel full load of tables during a migration. In this example, I create a list of partitioned tables in Oracle and migrate it using the partitions-auto parallel load method.
I’ll insert the example data in the source Oracle table by using the statements below.
Now I create a migration task with the table-mapping JSON below that includes the new table-settings rule.
After the DMS task is complete, I can verify that the tables were loaded in parallel from the logs.
If source table is not partitioned you may use below configuration with the range-segmentation option
The above configuration will migrate the data in three segments (three threads in parallel) as below.
Improved LOB performance
This new version of DMS replication engine allows you to improve loading LOB data in couple of ways. Now you can specify different LOB optimization settings while migrating LOB data.
Per table LOB settings
Using this feature, you can override task-level LOB settings for some or all of your tables. To do this, you define the lob-settings in your table-settings rule. Here is an example table that includes some large LOB values.
Next I create a migration task and modify the LOB handling for our table using the new lob-settings rule. The bulk-max-size determines the maximum LOB size (KB) and it’s truncated if it’s bigger than size specified.
Even if this DMS task is created with FullLobMode: true, the per table LOB settings direct DMS to truncate LOB data in this particular table to 16,000. You can check the task logs to confirm this.
Inline LOB settings.
As you may know, while creating a DMS task, the LOB mode determines how LOBs are handled.
- Full LOB mode – In full LOB mode, AWS DMS migrates all LOBs from source to target regardless of size. In this configuration, AWS DMS has no information about the maximum size of LOBs to expect. Thus, LOBs are migrated one at a time, piece by piece. Full LOB mode can be slow.
- Limited LOB mode – In limited LOB mode, you set a maximum size LOB that AWS DMS should accept. Doing so allows AWS DMS to pre-allocate memory and load the LOB data in bulk. LOBs that exceed the maximum LOB size are truncated and a warning is recorded in the log file. In limited LOB mode, you get significant performance gains over full LOB mode.
Full LOB mode and limited LOB, each has their own pros and cons. We listened to our customers and combined the advantages of both LOB modes in inline LOB mode.
Now you can select this option when you need to replicate both small and large LOBs, and most of the LOBs are small. When this option is selected, during full load, AWS DMS task transfers the small LOBs inline, which is more efficient, and the large LOBs by performing a lookup from the source table.
During Change Processing however, both small and large LOBs are replicated by performing a lookup from the source table.
When you choose this option, the AWS DMS task checks all of the LOB sizes to determine which ones to transfer inline. LOBs larger than the specified size are replicated using full LOB mode. Therefore, if you know that most of the LOBs are larger than the specified setting, it is better not to use this option and allow an unlimited LOB size.
You can configure this option using a new attribute in task setting InlineLobMaxSize, which is only available when FullLobMode is set to true. The default value is 0. The range is 1 KB – 2 GB.
For example, you may use below AWS DMS task settings and can specify InlineLobMaxSize to a value of 5 and all LOBs larger than 5,000 are transferred inline.
Tables load order during full load
Now, with DMS version 3.1.2, you can control the order in which tables are loaded. This may be useful in some situations, for example, if your selected tables list contains tables of different sizes and you want the smaller tables to be loaded before the larger tables.
This is supported using a new attribute of selection rule called load-order. It’s allowed only for the rule-action include and you should specify a non-negative number. Zero is a default, the maximum is 2147483647 for the load order. Tables with higher load-order numbers begin loading before tables with lower load-order numbers.
For example, if you want to load two tables named BIG_Table and SMALL_Table in a particular order, you can control this using the load-order attributes in the task JSON as specified below. So, in this example, the table BIG_Table with load order 2 is loaded before table SMALL_Table with load order 1.
You should be able to confirm this behavior looking at AWS DMS task logs as shown below.
When a group of tables are set with the same load order, AWS DMS loads the tables according to the alphabetical order.
Support the update of primary key values on a PostgreSQL source
Previous versions of DMS would ignore updates to a primary key while working with PostgreSQL as source. For example, if you are running the following statements on a PostgreSQL source and having full load and ongoing replication tasks, you might see the behavior shown here.
Run the following command to create a table and insert a few records in the source PostgreSQL database.
Run the following command to check inserted records on the source PostgreSQL database.
Create an AWS DMS task to migrate existing data and ongoing replication for the preceding table.
Run the following command to update the primary key column on the source table.
The following snippet of a DMS task log shows the errors. The task ignores the update to the primary key column associated with the update and writes it to the DMS control table ‘awsdms_apply_exceptions’ on the target database.
Now, with DMS version 3.1.2, you can update the primary key values for PostgreSQL as source. The changes are replicated to the target database server.
With AWS DMS replication engine version 3.1.2, you can use all of the above exciting features. You can migrate 4-byte UTF-8 characters from any source database, use parallel full load to speed up migration, and use optimized LOB settings while dealing with different size of LOBs. You can also specify table load-order during migration, and update the primary key values for PostgreSQL source databases.
If you have questions or suggestions, add your comments below. Good luck and happy migrating!
About the Author
Mahesh Kansara is a database engineer at Amazon Web Services. He works with our customers to provide guidance and technical assistance on various database and analytical projects, helping them improving the value of their solutions when using AWS.