AWS Database Blog

AWS DMS homogenous migration from document-oriented databases to Amazon DocumentDB

Homogeneous data migrations in AWS Database Migration Service (AWS DMS) simplify the migration of self-managed, on-premises databases to their Amazon Relational Database Service (Amazon RDS) equivalents. For example, you can use homogeneous data migrations to migrate an on-premises, document-oriented database with MongoDB compatibility to Amazon DocumentDB (with MongoDB compatibility). AWS DMS automates the data migration process by using native database tools to provide straightforward and performant like-to-like migrations.

Homogeneous data migrations are serverless, which means that AWS DMS automatically provisions the resources required for your migration. With homogeneous data migrations, you can migrate collections and their indexes. When you create a migration project with the compatible source and target data providers of the same type, AWS DMS deploys a serverless environment where your data migration runs and AWS DMS connects to the source data provider, reads the source data, dumps the files on the disk, and restores the data using native database tools.

In this post, we discuss how to migrate a self-managed document database or Amazon DocumentDB database to Amazon DocumentDB using AWS DMS homogeneous migration.

Solution overview

The following diagram shows the process of using homogeneous data migrations in AWS DMS to migrate a source document database to Amazon DocumentDB.

For homogeneous data migrations of the full load only type, AWS DMS loads collections from the source database to the target database. AWS DMS uses mongodump to read data from your source database and store it on the disk attached to the serverless environment. After AWS DMS reads all your source data, it uses mongorestore in the target database to restore your data.

For full load and change data capture (CDC) migrations, AWS DMS replicates on-going changes on the selected collections after the initial data load. AWS DMS automatically enables the change stream for all collections. You simply have to size the operations log (oplog) to retain the changes needed for migration.

To use ongoing replication (CDC) with the source document database, AWS DMS requires access to the source document database oplog.

To use ongoing replication with Amazon DocumentDB, AWS DMS requires access to the Amazon DocumentDB cluster’s change streams.

The following diagram illustrates how homogeneous data migration works.

AWS DMS homogeneous migration connects to the source database and target database using data providers and network and security details captured from the instance profile. In the following sections, we show you how to create these components.

Prerequisites

Complete the steps in this section to set up the prerequisite resources.

Create a local source database

Create a local source document database and load data (for this post, we use a sample database with sample data).

Create a target Amazon Document database

Create your target Amazon Document DB cluster database.

Create an IAM policy and role

Create an AWS Identity and Access Management (IAM) policy and role to use AWS DMS homogeneous migration. AWS DMS requires access to VPC peering, route tables, security groups, and other AWS resources. Also, AWS DMS stores logs, metrics, and progress for each data migration in Amazon CloudWatch. To create a data migration project, AWS DMS needs access to these services.

Configure source and target database

Configure your source and target databases and create database users with the minimum permissions required for homogeneous data migrations in AWS DMS.

For an AWS DMS migration with a source document database, you can create a user with required permissions only on the database to migrate.

In this example, we created a user with minimum privileges in the source:

db.createUser(
{
user: "dms",
pwd: "xxxxxx",
roles: [ { role: "readAnyDatabase", db: "admin" }]
})

The output looks like the following screenshot.

On the target Amazon DocumentDB database, you can create a user account with read/write permissions only on the database to migrate.

To migrate all databases, you should use an admin user or a user with a readWriteAnyDatabase or dbAdminAnyDatabase role. To migrate only a single database, you should create an individual user in that database with readwrite access.

The following command creates a new user named dms and grants read and write access to the test database:

db.createUser({user: "dms", pwd: "xxxxxx", roles: [{role: "readWrite", db: "test"}]})

We get the following output.

Configure AWS Secrets Manager

Store your source and target database credentials in AWS Secrets Manager. For homogeneous migration, it is compulsory to store the user name and password in Secrets Manager.

For the source document database, you need to store the values in plain text.

For the target Amazon DocumentDB database, use database credentials.

Create a subnet group

On the AWS DMS console, create a subnet group. A subnet is a range of IP addresses in your VPC. A replication subnet group includes subnets from different Availability Zones that your instance profile can use and is distinct from subnet groups that Amazon Virtual Private Cloud (Amazon VPC) and Amazon RDS use.

Create data providers

On the AWS DMS console, choose Data providers under Convert and migrate in the navigation pane, then create your data providers.

Data providers are similar to source database endpoints in AWS DMS. At the time of writing, source document database versions 4.x, 5.x, and 6.0 and Amazon DocumentDB versions 3.6, 4.0, and 5.0 are supported versions.

The following screenshot shows the configuration of the source data provider.

The following screenshot shows the configuration of the target data provider.

Create an instance profile

AWS DMS creates a serverless environment for homogeneous data migrations in a VPC using Amazon VPC.

When you create your instance profile, specify the VPC to use. You can use your default VPC for your account and AWS Region, or you can create a new VPC. For each data migration, AWS DMS establishes a VPC peering connection with the VPC that you use for your instance profile.

Next, AWS DMS adds the CIDR block in the security group that is associated with your instance profile. Because AWS DMS attaches a public IP address to your instance profile, all your data migrations that use the same instance profile have the same public IP address. When your data migration stops or fails, AWS DMS deletes the VPC peering connection.

The following screenshot shows the configuration of the instance profile.

Create an AWS DMS migration project

On the AWS DMS console, create a migration project using the resources that you created in the previous steps. The following screenshot shows the project configuration.

The migration project is created immediately.

Create and run a data migration task

Complete the following steps to create your data migration task:

  1. On the AWS DMS console, choose Migration projects in the navigation pane and open the details page of the migration project.
  2. On the Data migrations tab, choose Create data migration.
  3. Enter all the details for your migration task, as shown in the following screenshot.

    We selected the sample database as the schema and all the collections in the sample database. If no selection rule is specified, the task will migrate all the databases of the source document database instance, excluding local, config, and admin databases.
  4. Run the migration task.

After the AWS DMS Serverless resources are created, data migration begins. It takes approximately 15–20 minutes to start the task, but you won’t see Amazon CloudWatch logs immediately. They appear after the required resources are created.

After some time, you can see the details of the completed migration task. In the target database, you can see all the collections after the full load.

In AWS DMS homogenous migrations, secondary indexes are created as well as the \_id primary key index. You can check that the indexes match between the source and target.

The following screenshot shows the source side.

The following screenshot shows the target side.

CDC-only task

At the time of writing this post, for homogeneous data migrations with CDC, only the Immediate option is supported. AWS DMS automatically captures the start point for the replication when the actual data migration starts.

The following screenshot shows the CDC configuration.

Monitoring

After you start the homogeneous data migration, you can monitor its status and progress. Large migrations may take hours to complete the full load phase. To maintain the reliability, availability, and high performance of the data migration, monitor its progress regularly.

You can use CloudWatch alarms or events to closely track your data migration. AWS DMS includes the CDCLatency metrics in CloudWatch.

You can track the CloudWatch logs for the status and errors during the migration by enabling CloudWatch logs when creating your data migration task.

The following CloudWatch log is for the sample migration demonstrated in this post.

[INFO]: Trying to connect to SOURCE database: XXXXX:27017
[INFO]: Successfully connected to SOURCE. The database VERSION: 5.0.12
[INFO]: Trying to connect to TARGET database: docdb-target.cluster-xxxxxxx.us-east-1.docdb.amazonaws.com:27017
[INFO]: Trying to connect to TARGET database: docdb-target.cluster-xxxxxxx.us-east-1.docdb.amazonaws.com:27017
[INFO]: Trying to connect to TARGET database: docdb-target.cluster-xxxxxxxx.us-east-1.docdb.amazonaws.com:27017
[INFO]: Successfully connected to TARGET. The database VERSION: 5.0.0
[INFO]: Selection rules specified for migration:
{RuleId: 557072232, RuleName: 557072232, RuleType: selection, database: sample, collection: %}
[INFO]: Starting the full load dump at Thu Aug 22 07:22:32 UTC 2024. This may take few minutes
[INFO]: Number of source collections: 10
[INFO]: START sample.users full-load
[INFO]: START CDC for sample.users
[INFO]: sample.users full-load finished: 0/185 dumped; 0/185 restored; 0 restore errors
[INFO]: sample.users exists on TARGET. Cleaning it.
[INFO]: sample.users full-load finished: 185/185 dumped; 185/185 restored; 0 restore errors
[INFO]: Target collection sample.users contains 185 documents
[INFO]: END sample.users full load
[INFO]: START CDC for sample.shipwrecks
[INFO]: START sample.shipwrecks full-load
[INFO]: sample.shipwrecks full-load finished: 0/11095 dumped; 0/11095 restored; 0 restore errors
[INFO]: sample.shipwrecks exists on TARGET. Cleaning it.
[INFO]: sample.shipwrecks full-load finished: 11095/11095 dumped; 11095/11095 restored; 0 restore errors
[INFO]: sample.sessions full-load finished: 1/1 dumped; 1/1 restored; 0 restore errors
[INFO]: Target collection sample.sessions contains 1 documents
[INFO]: END sample.sessions full load
[INFO]: Overall CDC latency: 0 seconds
[INFO]: Overall storage consumption: 0 bytes

Advantages and limitations

The following are advantages of homogenous data migrations:

  • With homogeneous data migrations, you pay by the hour only for the duration of the data migration. With no replication instances to provision, you don’t need to worry about over- provisioning or manually scaling capacity, saving time and cost.
  • Secondary indexes are also created as part of migration.

The following limitations apply when you use homogeneous data migrations:

  • For the source document database, AWS DMS doesn’t support create, rename, and drop collection operations.

Clean up

AWS resources created by the AWS DMS homogeneous migration incur costs as long as they are in use. When you no longer need the resources, clean them up by deleting the associated data migration under the migration project, along with the migration project.

Conclusion

In this post, we showed you how you can use homogeneous data migrations in AWS DMS to migrate data between two databases of the same type, in this case a source document database to Amazon DocumentDB, and how you can overcome AWS DMS limitations using homogenous migrations. We also discussed key features for homogeneous data migrations in AWS DMS.

We welcome your feedback. If you have any questions or suggestions, leave them in the comments section.


About the Authors

Nagarjuna Paladugula is a Senior Cloud Support Engineer at AWS, specialized in Oracle, Amazon RDS for Oracle, and AWS DMS. He has over 19 years’ experience in different database technologies, and uses his experience to offer guidance and technical support to customers migrating their databases to the AWS Cloud. Outside of work, Nagarjuna likes traveling, watching movies and web series, and running.

Chishanu Karombo is a Partner Solutions Architect specializing in databases at AWS. In his role, Chishanu works with AWS Partners to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.

Anshu Vajpayee is a Senior DocumentDB Specialist Solutions Architect at AWS. He has been helping customers adopt NoSQL databases and modernize applications using Amazon DocumentDB. Before joining AWS, he worked extensively with relational and NoSQL databases.