AWS Database Blog
Migrate an internet-scale online transactional system to Amazon DynamoDB using AWS DMS
Database migration is a complex process that usually involves capacity planning, data migration and cutover strategies, hardware and software procurement, and a lengthy debugging and testing schedule. The common challenge in migrating from a relational database to a non-relational database is selecting the appropriate modeling technique and migration technologies.
This is the second part of a two-part series. In the first post, we discussed data modelling for an internet-scale online transactional system using Amazon DynamoDB, a fully managed, serverless, key-value NoSQL database.
In this post, we discuss and build a methodology to migrate an online application from a relational database management system (RDBMS) to DynamoDB using AWS Database Migration Service (AWS DMS).
Solution overview
In the first post, we used the IMDb dataset normalized across multiple tables in a MySQL database as the source data. IMDb is well-known for its online database of movies, films, TV shows, and other media. In this post, we show you how to move a subset of the IMDb dataset from MySQL to a DynamoDB table using a single table design approach to meet the required access patterns. The solution includes the following steps:
- Prepare the IMDb source application for migration.
- Create an AWS Identity and Access Management (IAM) role for the AWS DMS jobs to assume.
- Create an AWS DMS replication instance.
- Create the source and target endpoints.
- Configure and run AWS DMS replication tasks.
- Verify the access patterns using PartiQL.
Prerequisites
A basic understanding of DynamoDB is required because this post introduces data modelling concepts and a migration strategy for highly transactional, internet-scale applications built on a relational database modeling system (RDBMS). A basic understanding of AWS CloudFormation and DMS is recommended as this post includes a CloudFormation template to build the target environment. An AWS account with the necessary IAM privileges is also required.
Before starting this post, complete the steps in Data modeling for an internet-scale online transactional system using Amazon DynamoDB. If you use your AWS account, you’ll incur a cost of approximately $30 per day for the resources provisioned by the CloudFormation template. To avoid future charges, make sure you delete the CloudFormation template when you’re done.
Prepare the IMDb source application for migration
In the first post, we discussed sample access patterns for an IMDb application. We used those access patterns to define collections of records using composite sort keys and built a single table design for the IMDb dataset.
Now, you’ll prepare the source dataset for migration. This is a heterogeneous migration, as the source application is running on a MySQL database on Amazon Elastic Compute Cloud (Amazon EC2), and the target environment is a key-value table on DynamoDB. You create a denormalized view of the information stored across the IMDb tables title_basics
and title_principals
. The information stored in the title_akas
and title_ratings
tables will be migrated one-to-one to the target DynamoDB table. The details of the design strategy are discussed in the first post.
To prepare the IMDb source application
- Go to the AWS Management Console for EC2 and choose Instances from the navigation pane. Select the instance named
MySQL-Instance
and choose Connect.
Note: TheMySQL-Instance
EC2 instance was created by the CloudFormation template in the first post. - Select the Session Manager tab and choose Connect.
- Log in to MySQL using the username and password entered when creating
dynamodb-designlab
stack.mysql -u DbMasterUsername -pDbMasterPassword
- Run the following code to create a denormalized view of IMDb movies.
CREATE VIEW imdbdata.movies AS\ SELECT tp.tconst,\ tp.ordering,\ tp.nconst,\ tp.category,\ tp.job,\ tp.characters,\ tb.titleType,\ tb.primaryTitle,\ tb.originalTitle,\ tb.isAdult,\ tb.startYear,\ tb.endYear,\ tb.runtimeMinutes,\ tb.genres,\ nm.primaryName,\ nm.birthYear,\ nm.deathYear,\ nm.primaryProfession,\ tc.directors,\ tc.writers\ FROM imdbdata.title_principals tp\ LEFT JOIN imdbdata.title_basics tb ON tp.tconst = tb.tconst\ LEFT JOIN imdbdata.name_basics nm ON tp.nconst = nm.nconst\ LEFT JOIN imdbdata.title_crew tc ON tc.tconst = tp.tconst;
- To review the count of records for movies, run the following command:
select count(*) from imdbdata.movies;
- The denormalized view should contain around 866,763 records. Your source database is now ready to be migrated to DynamoDB.
Create an IAM role
The first step is to create an IAM role for the AWS DMS jobs to assume.
To create the IAM role
- On the IAM console, choose Roles in the navigation pane.
- Choose Create role.
- Under AWS Services, choose DMS.
- Choose Next: Permissions.
- Attach the policy
AmazonDMSVPCManagementRole
. - Choose Next: Tags.
- Choose Next: Review.
- For Role name, enter
dms-vpc-role
. - Choose Create role.
Create an AWS DMS replication instance
Create an AWS DMS replication instance using a CloudFormation template to migrate data from MySQL to DynamoDB.
To create an AWS DMS replication instance
- Launch the CloudFormation template.
- Choose Next.
- For Stack name, enter
dynamodb-migrationlab
. - Under Parameters:
- Keep the default values of
DMSSubnet1CIDR
andDMSSubnet2CIDR
. - Enter the
imdbdata
database password under DBMasterPassword. - Copy the DBServerIP and NATGateway ID from the
dynamodb-designlab
stack’s Outputs tab. - Select
DB-VPC
from the VpcID dropdown.
- Keep the default values of
- Choose Next.
- Select I acknowledge that AWS CloudFormation might create IAM resources with custom names to create the IAM resources.
- Choose Create stack.
You can monitor the stack progress on the CloudFormation console. The stack takes 10–15 minutes to complete. When the stack shows the status CREATE_COMPLETE
, you can confirm that the replication instance is listed on the AWS DMS console. You should see a replication instance with instance class dms.c5.2xlarge
and the status Available
.
Create the source and target endpoints
After you create the replication instance, you need to create two endpoints: one for the source MySQL environment and one for the target DynamoDB environment.
To create the source endpoint
- On the AWS DMS console, choose Endpoints in the navigation pane.
- Choose Create endpoint.
- For Endpoint type, select Source endpoint.
- For Endpoint identifier, enter
mysql-endpoint
. - For Source engine, select MySQL.
- For Access to endpoint database, select AWS Secrets Manager.
- Copy the
DBSecret
andDMSReplicationRole
values from thedynamodb-migrationlab
stack’s Outputs tab and enter them in the Secret ID and IAM role boxes. - For SSL mode, select none.
- For User name, enter
dbuser
. - For Password, enter the secret value of
BlogDatabasePassword
from AWS Secrets Manager. - In the Test endpoint connection section, for VPC, select the VPC ending with
DB-VPC
. - Choose Run test to confirm your endpoint configuration is correct.
- When the test is complete, choose Create endpoint.
To create the target endpoint
- On the AWS DMS console, choose Endpoints in the navigation pane.
- Choose Create endpoint.
- On the Endpoints page, choose Create endpoint.
- For Endpoint type, select Target endpoint.
- For Endpoint identifier, enter
dynamodb-endpoint
. - For Target engine, select Amazon DynamoDB.
- For Service access role ARN, enter the role ARN of the
dynamodb-access
role. This role is created by the CloudFormation template at the beginning of this lab. You can find the ARN in the IAM dashboard. - In the Test endpoint connection section, for VPC, select the VPC ending with
DB-VPC
. - Choose Run test to confirm your endpoint configuration is correct.
- When the test is complete, choose Create endpoint.
Configure and run AWS DMS replication tasks
Create three replication tasks to migrate the denormalized view (movies
), ratings table (title_ratings
), and regions and languages table (title_akas
).
To configure and run replication tasks
- On the AWS DMS console, select Database migration tasks in the navigation pane.
- Choose Create task.
- For Task identifier, enter
historical-migration01
. - For Replication instance, enter
mysqltodynamodb-instance-*
. - For Source database endpoint, enter
mysql-endpoint
. - For Target database endpoint, enter
dynamodb-endpoint
. - For Migration type, choose Migrate existing data.
- For Editing mode under Task settings, select Wizard.
- For Target table preparation mode, select Do nothing.
Note: If you don’t select this option, the data will be truncated with each subsequent job. - Select Turn on CloudWatch logs to log the AWS DMS tasks in Amazon CloudWatch.
- For Editing mode under Table mappings, choose the JSON editor tab.
- Copy and paste the following JSON code and then choose Create task.
{ "rules": [ { "rule-type": "selection", "rule-id": "1", "rule-name": "1", "object-locator": { "schema-name": "imdbdata", "table-name": "movies", "table-type": "view" }, "rule-action": "include", "filters": [ { "filter-type": "source", "column-name": "tconst", "filter-conditions": [ { "filter-operator": "eq", "value": "tt0309377" }, { "filter-operator": "eq", "value": "tt12260846" }, { "filter-operator": "eq", "value": "tt1212419" }, { "filter-operator": "eq", "value": "tt1205489" }, { "filter-operator": "eq", "value": "tt1057500" }, { "filter-operator": "eq", "value": "tt0949815" }, { "filter-operator": "eq", "value": "tt0824747" }, { "filter-operator": "eq", "value": "tt0772168" }, { "filter-operator": "eq", "value": "tt0498380" }, { "filter-operator": "eq", "value": "tt0418689" }, { "filter-operator": "eq", "value": "tt0405159" }, { "filter-operator": "eq", "value": "tt0327056" }, { "filter-operator": "eq", "value": "tt2310814" }, { "filter-operator": "eq", "value": "tt2179136" }, { "filter-operator": "eq", "value": "tt2083383" }, { "filter-operator": "eq", "value": "tt1924245" }, { "filter-operator": "eq", "value": "tt1912421" }, { "filter-operator": "eq", "value": "tt1742044" }, { "filter-operator": "eq", "value": "tt1616195" }, { "filter-operator": "eq", "value": "tt6997426" }, { "filter-operator": "eq", "value": "tt6802308" }, { "filter-operator": "eq", "value": "tt3513548" }, { "filter-operator": "eq", "value": "tt3263904" }, { "filter-operator": "eq", "value": "tt3031654" }, { "filter-operator": "eq", "value": "tt8884452" } ] } ] }, { "rule-type": "object-mapping", "rule-id": "2", "rule-name": "2", "rule-action": "map-record-to-record", "object-locator": { "schema-name": "imdbdata", "table-name": "movies", "table-type": "view" }, "target-table-name": "movies", "mapping-parameters": { "partition-key-name": "mpkey", "sort-key-name": "mskey", "exclude-columns": [], "attribute-mappings": [ { "target-attribute-name": "mpkey", "attribute-type": "scalar", "attribute-sub-type": "string", "value": "${tconst}" }, { "target-attribute-name": "mskey", "attribute-type": "scalar", "attribute-sub-type": "string", "value": "DETL|${category}|${ordering}" } ] } } ] }
This code includes source-to-target mapping, including any transformation of the records that is performed during migration. The task automatically starts loading the selected movies from the source to the target DynamoDB table.
Note: To reduce the load time, we limited the migration list. The preceding JSON code uses a list of 28 movies. The remaining steps focus on just those movies. However, feel free to load the remaining data if you want to explore the full dataset. To load the full dataset, remove the filter-condition key-values from the following JSON code. We provide more information about the full dataset at the end of this section.
To create the second task
- Repeat the previous procedure to create the second task. For this task, enter
historical-migration02
as the Task identifier. - Use the following JSON code to migrate the
title_akas
table from the MySQL IMDb database:
{
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "1",
"object-locator": {
"schema-name": "imdbdata",
"table-name": "title_akas",
"table-type": "table"
},
"rule-action": "include",
"filters": [
{
"filter-type": "source",
"column-name": "titleId",
"filter-conditions": [
{
"filter-operator": "eq",
"value": "tt0309377"
},
{
"filter-operator": "eq",
"value": "tt12260846"
},
{
"filter-operator": "eq",
"value": "tt1212419"
},
{
"filter-operator": "eq",
"value": "tt1205489"
},
{
"filter-operator": "eq",
"value": "tt1057500"
},
{
"filter-operator": "eq",
"value": "tt0949815"
},
{
"filter-operator": "eq",
"value": "tt0824747"
},
{
"filter-operator": "eq",
"value": "tt0772168"
},
{
"filter-operator": "eq",
"value": "tt0498380"
},
{
"filter-operator": "eq",
"value": "tt0418689"
},
{
"filter-operator": "eq",
"value": "tt0405159"
},
{
"filter-operator": "eq",
"value": "tt0327056"
},
{
"filter-operator": "eq",
"value": "tt2310814"
},
{
"filter-operator": "eq",
"value": "tt2179136"
},
{
"filter-operator": "eq",
"value": "tt2083383"
},
{
"filter-operator": "eq",
"value": "tt1924245"
},
{
"filter-operator": "eq",
"value": "tt1912421"
},
{
"filter-operator": "eq",
"value": "tt1742044"
},
{
"filter-operator": "eq",
"value": "tt1616195"
},
{
"filter-operator": "eq",
"value": "tt6997426"
},
{
"filter-operator": "eq",
"value": "tt6802308"
},
{
"filter-operator": "eq",
"value": "tt3513548"
},
{
"filter-operator": "eq",
"value": "tt3263904"
},
{
"filter-operator": "eq",
"value": "tt3031654"
},
{
"filter-operator": "eq",
"value": "tt8884452"
}
]
}
]
},
{
"rule-type": "object-mapping",
"rule-id": "2",
"rule-name": "2",
"rule-action": "map-record-to-record",
"object-locator": {
"schema-name": "imdbdata",
"table-name": "title_akas",
"table-type": "table"
},
"target-table-name": "movies",
"mapping-parameters": {
"partition-key-name": "mpkey",
"sort-key-name": "mskey",
"exclude-columns": [],
"attribute-mappings": [
{
"target-attribute-name": "mpkey",
"attribute-type": "scalar",
"attribute-sub-type": "string",
"value": "${titleId}"
},
{
"target-attribute-name": "mskey",
"attribute-type": "scalar",
"attribute-sub-type": "string",
"value": "REGN|${region}"
}
]
}
}
]
}
To create the third task
- Repeat the previous procedure to create the third task. For this task, enter
historical-migration03
as the Task identifier. - Use the following JSON code to migrate the
title_ratings
table from the MySQL IMDb database:{ "rules": [ { "rule-type": "selection", "rule-id": "1", "rule-name": "1", "object-locator": { "schema-name": "imdbdata", "table-name": "title_ratings", "table-type": "table" }, "rule-action": "include", "filters": [ { "filter-type": "source", "column-name": "tconst", "filter-conditions": [ { "filter-operator": "eq", "value": "tt0309377" }, { "filter-operator": "eq", "value": "tt12260846" }, { "filter-operator": "eq", "value": "tt1212419" }, { "filter-operator": "eq", "value": "tt1205489" }, { "filter-operator": "eq", "value": "tt1057500" }, { "filter-operator": "eq", "value": "tt0949815" }, { "filter-operator": "eq", "value": "tt0824747" }, { "filter-operator": "eq", "value": "tt0772168" }, { "filter-operator": "eq", "value": "tt0498380" }, { "filter-operator": "eq", "value": "tt0418689" }, { "filter-operator": "eq", "value": "tt0405159" }, { "filter-operator": "eq", "value": "tt0327056" }, { "filter-operator": "eq", "value": "tt2310814" }, { "filter-operator": "eq", "value": "tt2179136" }, { "filter-operator": "eq", "value": "tt2083383" }, { "filter-operator": "eq", "value": "tt1924245" }, { "filter-operator": "eq", "value": "tt1912421" }, { "filter-operator": "eq", "value": "tt1742044" }, { "filter-operator": "eq", "value": "tt1616195" }, { "filter-operator": "eq", "value": "tt6997426" }, { "filter-operator": "eq", "value": "tt6802308" }, { "filter-operator": "eq", "value": "tt3513548" }, { "filter-operator": "eq", "value": "tt3263904" }, { "filter-operator": "eq", "value": "tt3031654" }, { "filter-operator": "eq", "value": "tt8884452" } ] } ] }, { "rule-type": "object-mapping", "rule-id": "2", "rule-name": "2", "rule-action": "map-record-to-record", "object-locator": { "schema-name": "imdbdata", "table-name": "title_ratings", "table-type": "table" }, "target-table-name": "movies", "mapping-parameters": { "partition-key-name": "mpkey", "sort-key-name": "mskey", "exclude-columns": [], "attribute-mappings": [ { "target-attribute-name": "mpkey", "attribute-type": "scalar", "attribute-sub-type": "string", "value": "${tconst}" }, { "target-attribute-name": "mskey", "attribute-type": "scalar", "attribute-sub-type": "string", "value": "RTNG" } ] } } ] }
The replication job for historical migration moves data from the MySQL imdbdata.movies
view, title_akas
table, and title_ratings
table. If you’re loading records based on the preceding code, it will take 5–10 minutes to complete all three jobs. Loading the full dataset will take longer:
- The
historical-migration01
job migrates over 800,000 records and normally takes 2–3 hours - The
historical-migration02
job migrates over 747,000 records and normally takes 2–3 hours - The
historical-migration03
job migrates over 79,000 records and normally takes 10–15 minutes
You can track the status of the data load on the Table statistics tab of the migration task. When the status of all three tasks is Load complete
, you can move to the next step.
Verify the access patterns using PartiQL
DynamoDB supports PartiQL—a SQL-compatible query language—to select, insert, update, and delete data in DynamoDB. With PartiQL, you can use the DynamoDB console to interact with DynamoDB tables and run queries. In this exercise, you explore a few access patterns using PartiQL statements.
You use PartiQL scripts to demonstrate the access patterns discussed in the first post. You get the details for each movie using its unique tconst
. The denormalized table is created with each row representing a unique combination of movie and crew (tconst
and nconst
). Because tconst
is part of the partition key for the base table, you can use WHERE conditions to select the details.
To demonstrate the access patterns
- On the DynamoDB console, choose PartiQL editor in the navigation pane.
- Select the
movies
table that the AWS DMS job created. - Choose the options menu (three dots) and select Scan table.
- Use the following commands in the PartiQL query editor to verify the access patterns:
- Find all cast and crew of a movie:
SELECT * FROM "movies" WHERE "mpkey" = 'tt0309377' and begins_with("mskey",'DETL|') Output: Items returned (10)
- Find all actors of a movie:
SELECT * FROM "movies" WHERE "mpkey" = 'tt0309377' and begins_with("mskey",'DETL|actor') Output: Items returned (2)
- Find details of a movie (such as title, publication year, runtime, and other information):
SELECT * FROM "movies" WHERE "mpkey" = 'tt0309377' and begins_with("mskey",'DETL|') and "ordering" = '1' Output: Items returned (1)
- Find all published regions, languages, and titles of a movie:
SELECT * FROM "movies" WHERE "mpkey" = 'tt0309377' and begins_with("mskey",'REGN|') Output: Items returned (36)
- Find a movie title for a specific region of the movie:
SELECT * FROM "movies" WHERE "mpkey" = 'tt0309377' and begins_with("mskey",'REGN|NZ') Output: Items returned (1)
- Find the original title of a movie published across multiple languages and regions:
SELECT * FROM "movies" WHERE "mpkey" = 'tt0309377' and begins_with("mskey",'REGN|') and "types" = 'original' Output: Items returned (1)
- Find all cast and crew of a movie:
To access data at the crew member level
To access information at the crew member level, we need to create an additional global secondary index (GSI) with a new partition key nconst
(unique for crew member). This allows you to query on the new partition key for the GSI instead of scanning the base table.
- In the DynamoDB console, choose Tables in the navigation pane.
- Select the movies table and then select Create index from the Actions menu.
- For Partition key, enter
nconst
. - For Data type, select String.
- For Sort key, enter
startYear
. - For Data type, select String.
- For Index name, enter
nconst-startYear-index
. - Choose Create index. Index creation can take 5–10 minutes depending on the number of records in the base table.
- When the GSI status column changes from
Pending
toAvailable
, go back to the PartiQL editor to run the following queries on the GSI:
-
- Find all movies by a specific crew member:
SELECT * FROM "movies"."nconst-startYear-index" WHERE "nconst" = 'nm0000142' Output: Items returned (25)
- Find all the movies by a specific actor since the year 2002:
SELECT * FROM "movies"."nconst-startYear-index" WHERE "nconst" = 'nm0000142' and "startYear" >= '2002' ORDER BY "startYear" Output: Items returned (25)
- Find all movies by a specific crew member:
Congratulations! You have completed and verified the RDBMS migration exercise.
Clean up
When you’ve completed your work, remove any resources you created to avoid incurring additional charges:
- IAM role:
dms-vpc-role
. - Endpoints:
dynamodb-endpoint
andmysql-endpoint
. - AWS DMS tasks:
historical-migration-01
,historical migration-02
andhistorical-migration-03
. - CloudWatch log groups created by the AWS DMS tasks.
- Delete CloudFormation stacks:
dynamodb-migrationlab
anddynamodb-designlab
.
Conclusion
In this post, we showed you how to migrate an application from a self-managed MySQL database on Amazon EC2 to DynamoDB by using AWS DMS to transfer the data into a single-table design schema. We demonstrated using PartiQL queries to access specific patterns from the base table and secondary index. Additionally, we discussed DynamoDB modeling based on access patterns.
We welcome your feedback and suggestions. Let us know what you’re planning to build in the comments section, and how this post has helped you in your AWS journey.
About the Authors
Abhishek Srivastav is a Senior Solutions Architect at AWS. He is passionate about enabling customers to accelerate their cloud adoption. He is an IoT enthusiast and holds deep expertise in NoSQL databases, analytics, and AI/ML technologies. He is passionate about finding answers to complex problems by drawing on his in-depth understanding of these technologies. He has held lead positions for NoSQL Center of Excellence roles at various enterprise customers prior to joining AWS.
Chad Tindel is a DynamoDB Specialist Solutions Architect based out of New York City. He works with large enterprises to evaluate, design, and deploy DynamoDB-based solutions. Prior to joining Amazon, he held similar roles at Red Hat, Cloudera, MongoDB, and Elastic.