AWS Database Blog

Migrating to Amazon DocumentDB with the online method

Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. You can use the same MongoDB 3.6 application code, drivers, and tools to run, manage, and scale workloads on Amazon DocumentDB without having to worry about managing the underlying infrastructure. As a document database, Amazon DocumentDB makes it easy to store, query, and index JSON data.

There are three primary approaches for migrating from MongoDB to Amazon DocumentDB: offline, online, and hybrid. For more information, see Migrating to Amazon DocumentDB.

This post discusses how to use the online approach to migrate self-managed MongoDB clusters that are hosted premises or on EC2 to Amazon DocumentDB. The online approach minimizes downtime because DMS continually reads from the source MongoDB oplog and applies those changes in near-real time on the source Amazon DocumentDB cluster. For a demo of the online method, see Video: Live migration to Amazon DocumentDB.

The online method is the best option if you want to minimize downtime and your source dataset is small (less than 1 TB). If your dataset is larger than 1 TB, you should use the hybrid or offline approach to take advantage of parallelization and the speed that you can achieve with mongorestore. For more information about migrating with the offline method, see Migrate from MongoDB to Amazon DocumentDB using the offline method.

This post shows you how to use the online approach to migrate data from a MongoDB replica set hosted on Amazon EC2 to an Amazon DocumentDB cluster.

Prerequisites

Before you start your migration, complete the following prerequisites:

  1. Verify your source version and configuration
  2. Set up and choose the size of your Amazon Document DB cluster
  3. Set up an EC2 instance

Verifying your source version and configuration

If your MongoDB source uses a version of MongoDB earlier than 3.6, you should upgrade your source deployment and your application drivers. They should be compatible with MongoDB 3.6 to migrate to Amazon DocumentDB.You can determine the version of your source deployment by entering the following code in the mongo shell:

mongoToDocumentDBOnlineSet1:PRIMARY> db.version()
3.4.4

Also, verify that the source MongoDB cluster (or instance) is configured as a replica set. You can determine if a MongoDB cluster is configured as a replica set with the following code:

db.adminCommand( { replSetGetStatus : 1 } )

If the output is an error message similar to “”errmsg” : “not running with –replSet””, the cluster is not configured as a replica set.

Setting up and sizing your source Amazon DocumentDB cluster

For this post, your target Amazon DocumentDB cluster is a replica set that you create with a single db.r5.large instance. When you size your cluster, choose the instance type that is suitable for your production cluster. For more information about Amazon DocumentDB instances and costs, see Amazon DocumentDB (with MongoDB compatibility) pricing.

Setting up an EC2 instance

To connect to the Amazon DocumentDB cluster to migrate indexes and for other tasks during the migration, create an EC2 instance in the same VPC as your cluster and install the mongo shell. For instructions, see Getting Started with Amazon DocumentDB. To verify the connection to Amazon DocumentDB, enter the following CLI command:

[ec2]$ mongo --ssl --host docdb-cluster-endpoint \
--sslCAFile rds-ca-2019-root.pem --username myuser \
--password mypassword
…
rs0:PRIMARY> db.runCommand('ping')
{ "ok" : 1 }

If you have trouble connecting to either your source instance or Amazon DocumentDB cluster, check the security group configurations for both to make sure that the EC2 instance has permission to connect to each on the correct port (27017 by default). For more information about troubleshooting, see Troubleshooting Amazon DocumentDB.

Amazon DocumentDB uses Transport Layer Security (TLS) encryption by default. To connect over a TLS-encrypted collection, download the certificate authority (CA) file to use the mongo shell to connect. See the following code:

[ec2 ]$ curl -O <a href="https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem">https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem</a>

You can also disable TLS. For more information, see Encrypting Data in Transit.

Online migration steps

The following diagram illustrates the five steps of the online migration process. The steps are as follows:

  1. Application continues to write to source
  2. Dump indexes using the Amazon DocumentDB Index Tool
  3. Restore indexes using the Amazon DocumentDB Index Tool
  4. Perform full load and replicate data with AWS DMS
  5. Change application endpoint to Amazon DocumentDB cluster

Step 1: Application continues to writing to source

When you use the online method to migrate to Amazon DocumentDB, your application continues to write to the source MongoDB database. Step 5 discusses ceasing writes to the source database and changing the application to point to the target Amazon DocumentDB cluster.

Step 2: Dumping indexes using the Amazon DocumentDB Index Tool

Before you begin your migration, create the same indexes on your target Amazon DocumentDB cluster that you have on your source MongoDB cluster. Although AWS DMS handles the migration of data, it does not migrate indexes. To migrate the indexes, on the EC2 instance that you created as a prerequisite, use the Amazon DocumentDB Index Tool to export indexes from the MongoDB cluster. You can get the tool by creating a clone of the Amazon DocumentDB tools GitHub repo and following the instructions in README.md.

The following code dumps indexes from your source MongoDB cluster to a directory on your EC2 instance (the sample user names and passwords provided in this post are for illustrative purposes only, you should always choose strong passwords):

python migrationtools/documentdb_index_tool.py --dump-indexes 
--dir ~/index.js/ 
--host ec2-user.us-west-2.compute.amazonaws.com 
--auth-db admin 
--username user
--password password

2020-02-11 21:46:50,432: Successfully authenticated to database: admin
2020-02-11 21:46:50,432: Successfully connected to instance ec2-user.us-west-2.compute.amazonaws.com:27017
2020-02-11 21:46:50,432: Retrieving indexes from server...
2020-02-11 21:46:50,440: Completed writing index metadata to local folder: /home/ec2-user/index.js/

After the successful export of the indexes, the next step is to restore those indexes in your Amazon DocumentDB cluster.

Step 3: Restoring indexes using the Amazon DocumentDB Index Tool

To restore the indexes that you exported in your target cluster in the preceding step, use the Amazon DocumentDB Index Tool.

The following code restores the indexes in your Amazon DocumentDB cluster from your EC2 instance:

python migrationtools/documentdb_index_tool.py --restore-indexes
--dir ~/index.js/ 
--host docdb-2x2x-02-02-19-07-xx.cluster-xxxxxxxx.us-west-2.docdb.amazonaws.com:27017
--tls --tls-ca-file ~/rds-ca-2019-root.pem 
--username user 
--password password

2020-02-11 21:51:23,245: Successfully authenticated to database: admin
2020-02-11 21:51:23,245: Successfully connected to instance docdb-2x2x-02-02-19-07-xx.cluster-xxxxxxxx.us-west-2.docdb.amazonaws.com:27017
2020-02-11 21:51:23,264: zips-db.zips: added index: _id

To confirm that you restored the indexes correctly, connect to your Amazon DocumentDB cluster with the mongo shell and list the indexes for a given collection. See the following code:

mongo --ssl 
--host docdb-2020.cluster-xxxxxxxx.us-west-2.docdb.amazonaws.com:27017
--sslCAFile rds-ca-2019-root.pem --username documentdb --password documentdb
db.zips.getIndexes()

Step 4: Performing full load and replicating data with AWS DMS

AWS DMS is a managed service that helps you migrate databases to AWS services efficiently and securely. AWS DMS enables database migration using two methods: full data load and change data capture (CDC). The online migration approach uses AWS DMS to perform a full data copy and uses CDC to replicate changes to Amazon DocumentDB. For more information about using AWS DMS, see AWS Database Migration Service Step-by-Step Walkthroughs.

To perform the online migration, complete the following steps:

  1. Create an AWS DMS replication instance. For instructions, see Working with an AWS DMS Replication Instance.
    For data migration, this post uses the dms.t2.medium instance type. AWS DMS uses the replication instance to run the task that migrates data from your MongoDB source to the Amazon DocumentDB target cluster.
    Additionally, AWS DMS offers free replication instances for up to six months for certain instance types and migration targets. For more information, see AWS Database Migration Service: Free DMS.
  1. Create the MongoDB source and Amazon DocumentDB target endpoints. For more information, see Working with AWS DMS Endpoints.
    The following screenshot shows the endpoints for this post for the MongoDB cluster and target Amazon DocumentDB cluster.
  1. Create a replication task to migrate the data between the source and target endpoints.
    1. Choose the task type Full data load followed by ongoing data replication.
    2. Enable Start task on create.
      Your replication begins immediately after task creation. The following screenshot shows the status of a database migration task that has completed the full load and is currently performing ongoing replication.
      If you choose the task mongodbtodocumentbd-online-fullandongoing, you can review more specific details. In the Table statistics section, the task shows the statistics of full data load, followed by the ongoing replication between the source and destination databases. See the following screenshot.
      To verify that the number of documents matches in each, run the command db.collection.count() in your source and target databases.
      You can also monitor the migration’s status as an Amazon CloudWatch metric and create a dashboard to show progress. The following screen shows the rate of incoming CDC changes from the source database.

Step 5: Changing the application endpoint to an Amazon DocumentDB cluster

After the full load is complete and the CDC process is replicating continuously, you are ready to change your application’s database connection string to use your Amazon DocumentDB cluster. For more information, see Understanding Amazon DocumentDB Endpoints and Best Practices for Amazon DocumentDB.

Summary

This post described how to migrate data from MongoDB to Amazon DocumentDB by using the online method. For more information, see Migrate from MongoDB to Amazon DocumentDB using the offline method and Ramping up on Amazon DocumentDB (with MongoDB compatibility).

If you have any questions or comments, please leave your thoughts in the comments section.

 


About the Authors

 

Vijay Injam is a NoSQL Data Architect at Amazon Web Services.

 

 

 

 

Jeff Duffy is a Sr NoSQL Specialist Solutions Architect at Amazon Web Services.

 

 

 

 

Joseph Idziorek is a Principal Product Manager at Amazon Web Services.