AWS Database Blog

AWS Database Migration Service and Amazon DynamoDB: What You Need to Know

Many organizations consider migrating from commercial relational databases like Microsoft SQL Server or Oracle to Amazon DynamoDB—a fully managed, fast, highly scalable, and flexible NoSQL database service. Amazon DynamoDB can increase or decrease capacity based on traffic, in accordance with business needs. It is a fully managed cloud database that supports both document and key-value store models. Its flexible data model, reliable performance, and automatic scaling of throughput capacity make it a great fit for mobile, web, gaming, ad tech, IoT, and many other applications.

This post describes the performance aspects of migrating data to Amazon DynamoDB using AWS DMS, minimizing downtime, and converting the data that is more suitable for DynamoDB using either map to record or map to document types. For the sake of this post, we will focus on these two considerations when migrating from a relational database management system (RDBMS) to Amazon DynamoDB:

  • Data migration speed and throughput are critical to reducing the impact of downtime for services, especially when the service must be available 24×7.
  • There are different schema design considerations when going from RDBMS to DynamoDB. For more information, see the blog post Choosing the Right DynamoDB Partition Key.

You can use AWS Database Migration Service (AWS DMS) to migrate your data to and from most widely used commercial and open source databases. It supports homogeneous and heterogeneous migrations between different database platforms. AWS DMS supports migration to a DynamoDB table as a target. It uses table mapping rules to map data from the source to the target DynamoDB table.

To map data to an Amazon DynamoDB target, you use a type of table mapping rule called object mapping. Object mapping lets you define the attribute names and the data to be migrated to them. There are two types of rules you can use with AWS DMS to map your relational data to DynamoDB: record-to-record and record-to-document.

The following diagram shows the overall architecture of how AWS DMS performs migration from RDBMS to Amazon DynamoDB:

Improving migration performance
AWS DMS performs migration to Amazon DynamoDB using a DynamoDB connector. The connector resides between the AWS DMS replication instance and the target (DynamoDB). Internally, this connector uses the AWS SDK for Amazon DynamoDB for pushing the data from the replication instance to DynamoDB. By default, when AWS DMS migrates for the DynamoDB target, the push of records during the FULL_LOAD is a single-threaded call.

With that said, the good news is that this single-threaded call can be changed. An enhancement to this connector now lets you configure the number of threads. This capability increases the parallel processing of AWS DMS in migration from a single thread. This functionality is applicable only when the FULL_LOAD runs. The change data capture (CDC) phase in AWS DMS for any target engine is transactional, so data integrity and consistency are being maintained. This DynamoDB connector is now available with all AWS DMS versions.

To improve the performance of your migration to DynamoDB, there are three parameters. Note that you can’t edit these parameters on the AWS DMS console. The parameters are set through the task settings. The task settings are initialized when a task that is related to DynamoDB as a target is created, which you can view under Task Settings (JSON).

The following sections describe the parameters that can help you manage the performance of your migration.

ParallelLoadThreads

  • Default Value: 0
  • Range: 2–32

The ParallelLoadThreads parameter helps increase the migration speeds with regards to the number of threads to migrate the data. Just be aware that setting a higher value can cause a load on the source instance. A large number of threads can have an adverse effect on database performance because a separate connection is required for each thread. Additionally, there is a provision where the value of this parameter can be increased to more than 32. To do that, you must raise a case with the AWS Support team.

ParallelLoadBufferSize

  • Default Value: 50
  • Range: 50–1000

The ParallelLoadBufferSize setting specifies the number of data records to store in the buffer. The default is 50 records. This has no impact on large objects (LOBs) for Amazon DynamoDB. Furthermore, the DynamoDB connector only supports character large objects (CLOBs) and national character large objects (NCLOBs), which are just treated as string data types on the DynamoDB target.

LoadMaxFileSize

  • Default Value: 0
  • Range: 0–100

The LoadMaxFileSize parameter specifies the maximum size of any CSV file used to transfer data to Amazon DynamoDB. This parameter is applicable only when MySQL and PostgreSQL are being used as a source in AWS DMS. Additionally, this parameter is in megabytes.

Configuring the parameters
To configure these parameters, follow these steps:

  1. In AWS DMS, create the task with DynamoDB as the target. You do this because the three performance improvements parameters are generated only when a task with DynamoDB as the target is created. These parameters can only be modified using the generated Task Settings JSON, which you can get from the AWS DMS console.
  2. After creating the task, copy the JSON for this created task from the Overview tab of the task, which is displayed in the bottom pane:
  3. Copy the JSON, and optionally, format it the same way. You can use the JSON Formatter tool for that purpose.Copy the output that you get from the formatting tool.
  4. In the JSON, you can see the performance improvement variables:
    "TargetMetadata":{  
          "TargetSchema":"",
          "SupportLobs":true,
          "FullLobMode":false,
          "LobChunkSize":0,
          "LimitedSizeLobMode":true,
          "LobMaxSize":32,
          "LoadMaxFileSize":0,
          "ParallelLoadThreads":0,
          "ParallelLoadBufferSize":0,
          "BatchApplyEnabled":false
       }
    
    Bash
  5. After performing the required changes, use the AWS Command Line Interface (AWS CLI) to modify the replication task:
    aws dms modify-replication-task --replication-task-arn <task-arn> --replication-task-settings file://task-settings1.json
    Bash

Example migration

To demonstrate how these attributes can help improve the speed of your migration, I conducted the following test. In the test, I migrated 1 million rows from Oracle to Amazon DynamoDB.

The following shows the table that I used in my test scenario on the Oracle side:

CREATE TABLE "FDRGIIT"."BIG_TABLE" 
(
"ID" NUMBER, 
"OWNER" VARCHAR2(30 BYTE) NOT NULL ENABLE, 
"OBJECT_NAME" VARCHAR2(30 BYTE) NOT NULL ENABLE, 
"SUBOBJECT_NAME" VARCHAR2(30 BYTE), 
"OBJECT_ID" NUMBER NOT NULL ENABLE, 
"DATA_OBJECT_ID" NUMBER, 
"OBJECT_TYPE" VARCHAR2(19 BYTE), 
"CREATED" DATE NOT NULL ENABLE, 
"LAST_DDL_TIME" DATE NOT NULL ENABLE, 
"TIMESTAMP" VARCHAR2(19 BYTE), 
"STATUS" VARCHAR2(7 BYTE), 
"TEMPORARY" VARCHAR2(1 BYTE), 
"GENERATED" VARCHAR2(1 BYTE), 
"SECONDARY" VARCHAR2(1 BYTE), 
CONSTRAINT "BIG_TABLE_PK" PRIMARY KEY ("ID")
);
SQL

The following sections describe the scenarios used in my tests.

Default scenario
In this scenario, I used the default settings for all the attributes when I created the migration task. I also ran my task with the DROP_AND_CREATE mode, allowing AWS DMS to create the tables for me on the DynamoDB end.

With these settings, it took almost two hours to migrate 1 million rows from Oracle to Amazon DynamoDB. You can view metrics like FullLoadThroughputRowsTarget, which gives the rate at which the rows are leaving the replication instance to commit to the target, denoted in rows per second. The FullLoadThroughputBandwidthTarget metric provides details on the actual network bandwidth used for the full load to push the data between the replication instance to the target respectively. Both of these metrics are depicted in kilobytes per second.

From the preceding graph, you can see that AWS DMS is loading only 130 rows per second on average from the replication instance to the target during the full load phase. This is very slow when compared to the size of the table. This graph can also help you identify whether a bottleneck occurs with AWS DMS preprocessing data before sending it out, or if there are issues with task settings or performance-based issues on the replication instance.

This metric helps you know how much network bandwidth is used between the source and replication instances, and between the replication instance and target per task. As you can see, the bandwidth aligns with the numbers of rows that AWS DMS was able to push from the replication instance to the target. Again, this is very low, causing the migration to take two hours to finish.

Custom scenario
In this scenario, I used all the same settings as in the previous scenario, except for the following parameters:

"ParallelLoadThreads": 10,
"ParallelLoadBufferSize": 500,
Bash

After this change, when I ran the task, the same table with 1 million rows took just eight minutes to finish the FULL_LOAD.

Reviewing the same metrics, as in the previous scenario, I could see an increase in the number of fetches happening from the replication instance per second (FullLoadThroughputRowsTarget) and also the increase in bandwidth consumption (FullLoadThroughputBandwidthTarget).

Best practices
Regarding the parameters mentioned in the previous section, I recommend that you take some time to read the AWS DMS Best Practices whitepaper. In addition, for deeper insight into improving migration throughput, see Best Practices in the AWS Database Migration Service User Guide. I also recommend that you do a proof of concept (PoC) to analyze the results given by AWS DMS to understand what values are ideal to strategize your migration.

On a side note, a number of factors can affect the performance of your migration: resource availability on the source, available network throughput, resource capacity of the replication server, ability of the target to ingest changes, type and distribution of source data, number of objects to be migrated, and so on. In our tests, we have been able to migrate a terabyte of data in approximately 12–13 hours (under “ideal” conditions). Our source databases contained a representative amount of relatively evenly distributed data with a few large tables containing up to 250 GB of data. The performance of your migration will be limited by one or more bottlenecks you encounter along the way.

The following are a few things you can do to increase performance:

  • Load multiple tables in parallel.
  • Remove bottlenecks on the target.
  • Use multiple tasks.
  • Optimize change processing.
  • Reduce load on your source system.
  • Optimize size for a replication instance.

Conclusion
In this post, I introduced aspects related to improving performance for migrating data from RDBMS (Oracle, Microsoft SQL Server, etc.) to Amazon DynamoDB, and minimizing downtime during the migration. I also showed you the default values related to the parameters and their limits in accordance with their usage to improve read/write performance and reduce downtime.

If you have questions or suggestions, please leave a comment below.

Good luck, and happy migrating!


About the Author

Abhinav Singh is a database engineer in Database Migration Service at Amazon Web Services. He works with our customers to provide guidance and technical assistance on database migration projects, helping them improving the value of their solutions when using AWS.