AWS Database Blog
AWS DMS homogenous migration from document-oriented databases to Amazon DocumentDB
Homogeneous data migrations in AWS Database Migration Service (AWS DMS) simplify the migration of self-managed, on-premises databases to their Amazon Relational Database Service (Amazon RDS) equivalents. For example, you can use homogeneous data migrations to migrate an on-premises, document-oriented database with MongoDB compatibility to Amazon DocumentDB (with MongoDB compatibility). AWS DMS automates the data migration process by using native database tools to provide straightforward and performant like-to-like migrations.
Homogeneous data migrations are serverless, which means that AWS DMS automatically provisions the resources required for your migration. With homogeneous data migrations, you can migrate collections and their indexes. When you create a migration project with the compatible source and target data providers of the same type, AWS DMS deploys a serverless environment where your data migration runs and AWS DMS connects to the source data provider, reads the source data, dumps the files on the disk, and restores the data using native database tools.
In this post, we discuss how to migrate a self-managed document database or Amazon DocumentDB database to Amazon DocumentDB using AWS DMS homogeneous migration.
Solution overview
The following diagram shows the process of using homogeneous data migrations in AWS DMS to migrate a source document database to Amazon DocumentDB.
For homogeneous data migrations of the full load only type, AWS DMS loads collections from the source database to the target database. AWS DMS uses mongodump
to read data from your source database and store it on the disk attached to the serverless environment. After AWS DMS reads all your source data, it uses mongorestore
in the target database to restore your data.
For full load and change data capture (CDC) migrations, AWS DMS replicates on-going changes on the selected collections after the initial data load. AWS DMS automatically enables the change stream for all collections. You simply have to size the operations log (oplog) to retain the changes needed for migration.
To use ongoing replication (CDC) with the source document database, AWS DMS requires access to the source document database oplog.
To use ongoing replication with Amazon DocumentDB, AWS DMS requires access to the Amazon DocumentDB cluster’s change streams.
The following diagram illustrates how homogeneous data migration works.
AWS DMS homogeneous migration connects to the source database and target database using data providers and network and security details captured from the instance profile. In the following sections, we show you how to create these components.
Prerequisites
Complete the steps in this section to set up the prerequisite resources.
Create a local source database
Create a local source document database and load data (for this post, we use a sample database with sample data).
Create a target Amazon Document database
Create your target Amazon Document DB cluster database.
Create an IAM policy and role
Create an AWS Identity and Access Management (IAM) policy and role to use AWS DMS homogeneous migration. AWS DMS requires access to VPC peering, route tables, security groups, and other AWS resources. Also, AWS DMS stores logs, metrics, and progress for each data migration in Amazon CloudWatch. To create a data migration project, AWS DMS needs access to these services.
Configure source and target database
Configure your source and target databases and create database users with the minimum permissions required for homogeneous data migrations in AWS DMS.
For an AWS DMS migration with a source document database, you can create a user with required permissions only on the database to migrate.
In this example, we created a user with minimum privileges in the source:
The output looks like the following screenshot.
On the target Amazon DocumentDB database, you can create a user account with read/write permissions only on the database to migrate.
To migrate all databases, you should use an admin user or a user with a readWriteAnyDatabase
or dbAdminAnyDatabase
role. To migrate only a single database, you should create an individual user in that database with readwrite
access.
The following command creates a new user named dms
and grants read and write access to the test database:
We get the following output.
Configure AWS Secrets Manager
Store your source and target database credentials in AWS Secrets Manager. For homogeneous migration, it is compulsory to store the user name and password in Secrets Manager.
For the source document database, you need to store the values in plain text.
For the target Amazon DocumentDB database, use database credentials.
Create a subnet group
On the AWS DMS console, create a subnet group. A subnet is a range of IP addresses in your VPC. A replication subnet group includes subnets from different Availability Zones that your instance profile can use and is distinct from subnet groups that Amazon Virtual Private Cloud (Amazon VPC) and Amazon RDS use.
Create data providers
On the AWS DMS console, choose Data providers under Convert and migrate in the navigation pane, then create your data providers.
Data providers are similar to source database endpoints in AWS DMS. At the time of writing, source document database versions 4.x, 5.x, and 6.0 and Amazon DocumentDB versions 3.6, 4.0, and 5.0 are supported versions.
The following screenshot shows the configuration of the source data provider.
The following screenshot shows the configuration of the target data provider.
Create an instance profile
AWS DMS creates a serverless environment for homogeneous data migrations in a VPC using Amazon VPC.
When you create your instance profile, specify the VPC to use. You can use your default VPC for your account and AWS Region, or you can create a new VPC. For each data migration, AWS DMS establishes a VPC peering connection with the VPC that you use for your instance profile.
Next, AWS DMS adds the CIDR block in the security group that is associated with your instance profile. Because AWS DMS attaches a public IP address to your instance profile, all your data migrations that use the same instance profile have the same public IP address. When your data migration stops or fails, AWS DMS deletes the VPC peering connection.
The following screenshot shows the configuration of the instance profile.
Create an AWS DMS migration project
On the AWS DMS console, create a migration project using the resources that you created in the previous steps. The following screenshot shows the project configuration.
The migration project is created immediately.
Create and run a data migration task
Complete the following steps to create your data migration task:
- On the AWS DMS console, choose Migration projects in the navigation pane and open the details page of the migration project.
- On the Data migrations tab, choose Create data migration.
- Enter all the details for your migration task, as shown in the following screenshot.
We selected the sample database as the schema and all the collections in the sample database. If no selection rule is specified, the task will migrate all the databases of the source document database instance, excluding local, config, and admin databases. - Run the migration task.
After the AWS DMS Serverless resources are created, data migration begins. It takes approximately 15–20 minutes to start the task, but you won’t see Amazon CloudWatch logs immediately. They appear after the required resources are created.
After some time, you can see the details of the completed migration task. In the target database, you can see all the collections after the full load.
In AWS DMS homogenous migrations, secondary indexes are created as well as the \_id
primary key index. You can check that the indexes match between the source and target.
The following screenshot shows the source side.
The following screenshot shows the target side.
CDC-only task
At the time of writing this post, for homogeneous data migrations with CDC, only the Immediate option is supported. AWS DMS automatically captures the start point for the replication when the actual data migration starts.
The following screenshot shows the CDC configuration.
Monitoring
After you start the homogeneous data migration, you can monitor its status and progress. Large migrations may take hours to complete the full load phase. To maintain the reliability, availability, and high performance of the data migration, monitor its progress regularly.
You can use CloudWatch alarms or events to closely track your data migration. AWS DMS includes the CDCLatency
metrics in CloudWatch.
You can track the CloudWatch logs for the status and errors during the migration by enabling CloudWatch logs when creating your data migration task.
The following CloudWatch log is for the sample migration demonstrated in this post.
Advantages and limitations
The following are advantages of homogenous data migrations:
- With homogeneous data migrations, you pay by the hour only for the duration of the data migration. With no replication instances to provision, you don’t need to worry about over- provisioning or manually scaling capacity, saving time and cost.
- Secondary indexes are also created as part of migration.
The following limitations apply when you use homogeneous data migrations:
- For the source document database, AWS DMS doesn’t support
create
,rename
, anddrop
collection
operations.
Clean up
AWS resources created by the AWS DMS homogeneous migration incur costs as long as they are in use. When you no longer need the resources, clean them up by deleting the associated data migration under the migration project, along with the migration project.
Conclusion
In this post, we showed you how you can use homogeneous data migrations in AWS DMS to migrate data between two databases of the same type, in this case a source document database to Amazon DocumentDB, and how you can overcome AWS DMS limitations using homogenous migrations. We also discussed key features for homogeneous data migrations in AWS DMS.
We welcome your feedback. If you have any questions or suggestions, leave them in the comments section.
About the Authors
Nagarjuna Paladugula is a Senior Cloud Support Engineer at AWS, specialized in Oracle, Amazon RDS for Oracle, and AWS DMS. He has over 19 years’ experience in different database technologies, and uses his experience to offer guidance and technical support to customers migrating their databases to the AWS Cloud. Outside of work, Nagarjuna likes traveling, watching movies and web series, and running.
Chishanu Karombo is a Partner Solutions Architect specializing in databases at AWS. In his role, Chishanu works with AWS Partners to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.
Anshu Vajpayee is a Senior DocumentDB Specialist Solutions Architect at AWS. He has been helping customers adopt NoSQL databases and modernize applications using Amazon DocumentDB. Before joining AWS, he worked extensively with relational and NoSQL databases.