Migrating to Amazon DocumentDB with the hybrid method
This blog post was last reviewed and updated February, 2022.
Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. You can use the same MongoDB 3.6 or 4.0 application code, drivers, and tools to run, manage, and scale workloads on Amazon DocumentDB without worrying about managing the underlying infrastructure. As a document database, Amazon DocumentDB makes it easy to store, query, and index JSON data.
There are three primary approaches for migrating from MongoDB to Amazon DocumentDB: offline, online, and hybrid. For more information, see Migration Approaches.
This post discusses how to use the hybrid approach to migrate data from MongoDB to Amazon DocumentDB. The hybrid approach combines the speed of the offline approach and the ability to minimize downtime with the online approach. For more information, see Video: Live migration to Amazon DocumentDB.
The hybrid method is the best option if you want to minimize downtime and your source dataset is greater than 1 TB. The hybrid method takes advantage of parallelization and the speed that you can achieve with
mongorestore to migrate the bulk of the data and then uses AWS Database Migration Service (DMS) to minimize downtime.
If your dataset is smaller than 1 TB, you should use the online or offline approach. For more information about migrating with the offline and online methods, see Migrate from MongoDB to Amazon DocumentDB using the offline method and Migrating to Amazon DocumentDB with the online method.
This post shows you how to use the hybrid approach to migrate data from a MongoDB replica set hosted on Amazon EC2 to an Amazon DocumentDB cluster.
Before you start your migration, complete the following prerequisites:
- Verify your source version and configuration
- Set up and choose the size of your Amazon DocumentDB cluster
- Set up an EC2 instance
Verifying your source version and configuration
If your MongoDB source uses a version of MongoDB earlier than 3.6, you should upgrade your source deployment and your application drivers. They should be compatible with MongoDB 3.6 to migrate to Amazon DocumentDB.
You can determine the version of your source deployment by entering the following code in the mongo shell:
Additionally, verify that the source MongoDB cluster (or instance) is configured as a replica set. You can determine if a MongoDB cluster is configured as a replica set with the following code:
If the output is an error message similar to
"errmsg" : "not running with --replSet", the cluster isn’t configured as a replica set.
Setting up and sizing your source Amazon DocumentDB cluster
For this post, your target Amazon DocumentDB cluster is a replica set that you create with a single db.r5.large instance. When you size your cluster, choose the instance type that is suitable for your production cluster. For more information about Amazon DocumentDB instances and costs, see Amazon DocumentDB (with MongoDB compatibility) pricing.
|Related Amazon DocumentDB posts|
Setting up an EC2 instance
To connect to the Amazon DocumentDB cluster to migrate indexes and for other tasks during the migration, create an EC2 instance in the same VPC as your cluster and install the mongo shell. For instructions, see Getting Started with Amazon DocumentDB. When creating AWS resources, we recommend that you follow the AWS IAM best practices. To verify the connection to Amazon DocumentDB, enter the following CLI command:
If you have trouble connecting to either your source instance or Amazon DocumentDB cluster, check the security group configurations for both to make sure that the EC2 instance has permission to connect to each on the correct port (27017 by default). For more information about troubleshooting, see Troubleshooting Amazon DocumentDB.
Amazon DocumentDB uses Transport Layer Security (TLS) encryption by default. To connect over a TLS-encrypted collection, download the certificate authority (CA) file to use the mongo shell to connect. See the following code:
You can also disable TLS. For more information, see Encrypting Data in Transit.
For index and data migration, a key consideration is ensuring the EC2 instance’s Amazon EBS volume is large enough to hold the exported data. You can obtain a rough estimate of a database’s size in bytes by running the
db.stats() command in the mongo shell and looking at the value of
storageSize. See the following code:
Hybrid migration steps
The following diagram illustrates the six steps of the hybrid migration process. The steps are as follows:
- Application continues to write to source
- Dump indexes using the Amazon DocumentDB Index Tool
- Dump data using mongodump
- Restore indexes using the Amazon DocumentDB Index Tool
- Restore data using mongorestore
- Replicate data with change data capture (CDC) using AWS DMS
- Change application endpoint to Amazon DocumentDB cluster
Step 1: Application continues to writing to source
When you use the hybrid method to migrate to Amazon DocumentDB, your application continues to write to the source MongoDB database. Step 7 discusses ceasing writes to the source database and changing the application to point to the target Amazon DocumentDB cluster.
Step 2: Dumping indexes using the Amazon DocumentDB Index Tool
Before you begin your migration, create the same indexes on your target Amazon DocumentDB cluster that you have on your source MongoDB cluster. Although AWS DMS handles the migration of data, it doesn’t migrate indexes. To migrate the indexes, on the EC2 instance that you created as a prerequisite, use the Amazon DocumentDB Index Tool to export indexes from the MongoDB cluster. You can get the tool by creating a clone of the Amazon DocumentDB tools GitHub repo and following the instructions in
The following code dumps indexes from your source MongoDB cluster to a directory on your EC2 instance:
After the successful export of the indexes, the next step is to restore those indexes in your Amazon DocumentDB cluster.
Step 3: Dumping data using mongodump
Export the data from your MongoDB replica set to the EC2 migration instance using the
mongodump tool. Set the
–-readPreference option to secondary to force the dump to connect to a secondary replica set member. This step reduces the potential impact of the
mongodump on the source deployment. To use the
--readPreference option, connect to the replica set member using the form
replicasetMember. See the following code:
The time it takes the data to export depends on the size of the source dataset, the speed of the network between the migration instance and the source, and the migration instance’s resources. Record the start time of the
mongodump process; you need this information to know when to start the DMS CDC process later.
After the successful export of the indexes and data, the next step is to restore the data and indexes in your Amazon DocumentDB cluster.
Step 4: Restoring indexes using the Amazon DocumentDB Index Tool
To restore the indexes that you exported in your target cluster in the preceding step, use the Amazon DocumentDB Index Tool.
The following code restores the indexes in your Amazon DocumentDB cluster from your EC2 instance:
To confirm that you restored the indexes correctly, connect to your Amazon DocumentDB cluster with the mongo shell and list the indexes for a given collection. See the following code:
Step 5: Restoring data using mongodump
To restore the data that you dumped in your target cluster in the Step 3, use the
The following code restores the data in your Amazon DocumentDB cluster from your EC2 instance. To increase the speed and parallelize the restore, use the
--numInsertionWorkersPerCollection option. As a rule of thumb, set the
numInsertionWorkersPerCollection value to the number of vCPUs on the cluster’s primary instance. Use option
--noIndexRestore to avoid creating indexes twice, because you restored the indexes in Step 4. See the following code:
mongodump operation includes all the databases from the source MongoDB cluster (for example, if
--db option doesn’t specify an individual database to dump), remove the admin directory from the resulting dump directory. Otherwise, an error occurs when you attempt to restore to Amazon DocumentDB.
Pay attention to the total duration of the restore. The MongoDB oplog size should large enough to hold the data for this duration as well as the time it takes to complete the online migration that Step 6 covers. The AWS DMS CDC task relies on the oplog to replicate data to Amazon DocumentDB.
Step 6: Performing full load and replicating data with AWS DMS
AWS DMS is a managed service that helps you migrate databases to AWS services efficiently and securely. AWS DMS enables database migration using two methods: full data load and CDC. The hybrid migration approach uses CDC to replicate changes to Amazon DocumentDB. For more information about using AWS DMS, see AWS Database Migration Service Step-by-Step Walkthroughs.
To perform the hybrid migration, complete the following steps:
- Create an AWS DMS replication instance. For instructions, see Working with an AWS DMS Replication Instance.
For data migration, this post uses the dms.t2.medium instance type. AWS DMS uses the replication instance to run the task that migrates data from your MongoDB source to the Amazon DocumentDB target cluster.
- Create the MongoDB source and Amazon DocumentDB target endpoints. For more information, see Working with AWS DMS Endpoints.
The following screenshot shows the endpoints for this post for the MongoDB cluster and target Amazon DocumentDB cluster.
- Create a replication task to migrate the data between the source and target endpoints.
a. Choose the task type Replicate data changes only.
b. Enable Start task on create.
Your replication begins immediately after task creation. The following screenshot shows the status of a database migration task that has completed the full load and is currently performing ongoing replication.
If you choose the task mongodbtodocumentbd-online-fullandongoing, you can review more specific details. In the Table statistics section, the task shows the statistics of full data load, followed by the ongoing replication between the source and destination databases. See the following screenshot.
To verify that the number of documents matches in each, run the command db.collection.count() in your source and target databases.
You can also monitor the migration’s status as an Amazon CloudWatch metric and create a dashboard to show progress. The following screen shows the rate of incoming CDC changes from the source database.
Step 7: Changing the application endpoint to an Amazon DocumentDB cluster
After the full load is complete and the CDC process is replicating continuously, you are ready to change your application’s database connection string to use your Amazon DocumentDB cluster. For more information, see Understanding Amazon DocumentDB Endpoints and Best Practices for Amazon DocumentDB.
This post described how to migrate data from MongoDB to Amazon DocumentDB by using the hybrid method. For more information about other migration methods, see Migrate from MongoDB to Amazon DocumentDB using the offline method, Migrating to Amazon DocumentDB with the online method, and Ramping up on Amazon DocumentDB (with MongoDB compatibility).
If you have any questions or comments, please leave your thoughts in the comments section.
About the Authors
Vijay Injam is a NoSQL Data Architect at Amazon Web Services.
Jeff Duffy is a Sr NoSQL Specialist Solutions Architect at Amazon Web Services.
Joseph Idziorek is a Principal Product Manager at Amazon Web Services.