AWS Database Blog

Migrate from Azure Cosmos DB API for MongoDB to Amazon DocumentDB (with MongoDB Compatibility) using the online method

Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. The Amazon DocumentDB Migration Guide outlines three primary approaches for migrating from MongoDB to Amazon DocumentDB: offline, online, and hybrid. Although the migration guide refers to MongoDB, you can use the offline migration approach for Azure Cosmos DB as well. You can only use the offline approach for migrating Cosmos DB to Amazon DocumentDB, not the online and hybrid migration approaches.

In this post, I explain how you can use the Azure Cosmos DB to Amazon DocumentDB migration utility tool to migrate Azure Cosmos DB API for MongoDB (with v3.6 wire protocol) to Amazon DocumentDB using online migration approach.

Related Amazon DocumentDB posts

Solution overview

The Azure Cosmos DB to Amazon DocumentDB migration utility tool is an application created to migrate a Cosmos DB database to Amazon DocumentDB with minimal downtime. The tool keeps the target Amazon DocumentDB cluster in sync with the source Cosmos DB until the client applications are cut over to the Amazon DocumentDB cluster. It uses the change feed in Azure Cosmos DB to record the changes and replay them on the Amazon DocumentDB cluster.

To accomplish this goal, I use the following services:

A high-level overview of the migration process is as follows:

  1. Prepare the environment for migration:
    1. Create an Amazon Elastic Compute Cloud (Amazon EC2) instance.
    2. Install the required packages using yum.
    3. Download the source code and binaries, and install the dependencies.
    4. Create an S3 bucket and copy Lambda files using the AWS Command Line Interface (AWS CLI).
    5. Create core resources using an AWS CloudFormation
    6. Create Amazon DocumentDB resources using the CloudFormation template.
    7. Save the Amazon DocumentDB connection string in Secrets Manager.
  2. The migration process:
    1. From the provided source code, run the migrator-app application to capture the change feed data.
    2. Create a backup of the Cosmos DB cluster using mongodump.
    3. Create indexes on the target cluster using the Amazon DocumentDB Index Tool.
    4. Restore the backup on the target cluster using the mongorestore
    5. Configure the application settings to apply the change feeds on the target Amazon DocumentDB cluster.
    6. Validate the target cluster is in sync with the source cluster.
  3. The cutover process:
    1. Stop the application from writing to the source Cosmos DB cluster.
    2. Stop the migrator-app application that records the change feed data.
    3. Restart the client applications with the connection string pointing to the Amazon DocumentDB cluster.

The following diagram illustrates the high-level architecture.

 

Required resources

For this post, I provide CloudFormation templates to simplify the deployment of the required resources. The prerequisites for the CloudFormation templates are as follows:

Additionally, the Cosmos DB cluster incurs higher activity than normal during the migration. Review the Request Units capacity needs for your Cosmos DB cluster.

Prepare the environment for migration

The migration-app application tool supports the migration of data from Cosmos DB’s API for MongoDB (v3.6 version). If the source cluster uses wire protocol support v3.2, upgrade the source deployment and MongoDB application drivers to the v3.6 version or above.

Step 1a: Create an EC2 instance

From the AWS Management Console, create an EC2 instance in a private subnet of a VPC with settings as shown in the following screenshot. I attached the security groups to allow inbound SSH traffic on port 22 and inbound Amazon DocumentDB traffic on port 27017 from this instance. For more information on how to create the security groups, refer to Work with security groups.

I’m using m5ad.xlarge instance type, with vCPU: 4 and RAM: 16 GB. If your source cluster has multiple collections with millions of documents, consider creating an EC2 instance with higher vCPU and RAM to take advantage of parallel processing.

Step 1b: Install the required packages using yum

Connect to the EC2 instance you just created using SSH and install the required yum packages using the following bash script. For more information on how to connect to an EC2 instance in private subnet using a bastion, refer to Securely Connect to Linux Instances Running in a Private Amazon VPC.

# Configure the package manager to include MongoDB v3.6 repo
cat <<EOF | sudo tee /etc/yum.repos.d/mongodb-org-3.6.repo
[mongodb-org-3.6]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/3.6/x86_64/
gpgcheck=1
enabled=1
gpgkey=https://www.mongodb.org/static/pgp/server-3.6.asc
EOF

# Install python3, pip3, MongoDB shell, and MongoDB tools
sudo yum update -y
sudo yum install -y amazon-linux-extras
sudo yum install -y python3-pip python3 python3-setuptools mongodb-org-shell mongodb-org-tools

Step 1c: Download the source code and binaries, and install the dependencies

Use the following bash script to download the cosmodb-migrator tool binaries and install the Python module dependencies:

# Change directory to your favorite directory and download source from GitHub repo
cd ~
curl -OL 'https://github.com/awslabs/amazon-documentdb-tools/archive/refs/heads/master.zip'
unzip master.zip
sh amazon-documentdb-tools-master/cosmos-db-migration-utility/scripts/build-package.sh
cd amazon-documentdb-tools-master/cosmos-db-migration-utility/build
tar -xvzf cosmosdb-migrator.tgz
rm -f cosmosdb-migrator.tgz
export BASE_DIR=$(pwd)

# Download the module dependencies for Migrator App
cd ${BASE_DIR}/migrator-app
pip3 install -r requirements.txt --user

# Download the module dependencies for Configure App
cd ${BASE_DIR}/configure
pip3 install -r requirements.txt --user

Step 1d: Create an S3 bucket and copy the Lambda files using the AWS CLI

The cloudformation/core-resources.yaml CloudFormation template requires that the Lambda functions and Lambda layers are uploaded to an S3 bucket. If you already have an S3 bucket and want to use it, upload the lambda/*.zip to the /lambda/ path on your S3 bucket as shown in the following screenshot. Otherwise, create a new S3 bucket with a globally unique name and upload the files to the /lambda/ path.

Step 1e: Create core resources using a CloudFormation template

The cloudformation/core-resources.yaml CloudFormation template is a shared resource stack that you can reuse across multiple migrations from Cosmos DB to Amazon DocumentDB clusters. When this template runs successfully, all the required resources for the migration, such as the S3 bucket, Amazon SQS queues, Lambda functions, and DynamoDB tables are created and configured automatically.

  1. Create a new stack using the cloudformation/core-resources.yaml template as shown in the following screenshot.

    On the next screen, you specify the stack details as shown in the following screenshot.
  2. Choose the VPC network and private subnets appropriate to your environment.
  3. Specify the Amazon S3 bucket name that you used in Step 1d.
    As a best practice, I recommend naming your Amazon DocumentDB cluster with same name as the source Cosmos DB cluster. It helps easily identify the mapping between the source and target during the migration process.
  4. Review the core resources stack, then choose Deploy.
  5. Confirm the CloudFormation deployment shows a status of CREATE_COMPLETE before continuing.

Step 1f: Create Amazon DocumentDB resources using a CloudFormation template

The cloudformation/documentdb.yaml CloudFormation template helps you create an Amazon DocumentDB cluster with a three compute instances. Use this template file to create a new Amazon DocumentDB cluster for every Cosmos DB cluster being migrated.

  1. Create a new stack using the cloudformation/documentdb.yaml template as shown in the following screenshot.

    On the next screen, you specify the stack details.
  2. Enter a unique stack name for the migration.
  3. Choose the VPC network, private subnets, security group for Amazon DocumentDB.
  4. For your instance type, recommend you to use an Amazon DocumentDB instance type that is close to your Cosmos DB cluster size.
  5. Enter an administrator password with values appropriate for your environment.
  6. Confirm the CloudFormation deployment shows a status of CREATE_COMPLETE before continuing.

Step 1g: Save your Amazon DocumentDB connection string in Secrets Manager

To save your Amazon DocumentDB connection string, complete the following steps:

  1. On the Amazon DocumentDB console, navigate to your cluster.
  2. On the Connectivity & Security tab, choose Connect.
  3. Choose Copy next to Connect to this cluster with an application.
  4. Copy and paste the text in your preferred text editor and replace <insertYourPassword> with the password used in the previous step.
  5. Save the connection string information in Secrets Manager as shown in the following code.
    # Please set these variables for your AWS environment
    export AWS_DEFAULT_REGION="<your-region>"
    export AWS_ACCESS_KEY_ID="<your-access-key>"
    export AWS_SECRET_ACCESS_KEY="<your-secret-access-key>"
    # Configure the connection string for your application
    # Note: Save the DocumentDB connection string in AWS Secrets Manager.
    cd ${BASE_DIR}/configure
    python3 main.py --cluster-name <your-cluster-name> --connection-string "<your-documentdb-connection-string>"

Update the values for AWS_DEFAULT_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, <your-cluster-name>, and <your-documentdb-connection-string> with appropriate values for your environment. The Lambda function batch-request-reader uses this connection string to apply the change feed data on the target Amazon DocumentDB cluster.

The migration process

Step 2a: Start the migrator-app application

The next step in the live migration process is to use the migration application to capture the change feed data from Cosmos DB cluster. The application saves the data into the Amazon S3, then stores the metadata and tracking information in DynamoDB tables. Start the migration application using the following commands. Update the values for <your-cosmosdb-connection-string> and <your-cluster-name> with values appropriate for your cluster.

Keep the migrator-app application running until the cutover period. For large database migrations, I strongly recommend you to run the following commands in a screen session or run with the nohup command to ensure that the migrator-app doesn’t get stopped when you log out.

cd ${BASE_DIR}/migrator-app
export ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)
export S3_CHANGE_FEED_BUCKET_NAME="${ACCOUNT_ID}-${AWS_DEFAULT_REGION}-change-feed"
export SOURCE_URI="<your-cosmosdb-connection-string>"
# Start the migrator app. Use nohup or screen session for large database migrations
python3 main.py --cluster-name <your-cluster-name>

After running the preceding command, you should observe an output similar to the following:

# Sample output: 
# 2021-03-02 17:24:13 INFO     [commandline_parser.py:23]  Command line arguments given: {"cluster_name": "app-name"}
# 2021-03-02 17:24:13 INFO     [ClusterMigrator.py:25]  Initializing the cluster migrator with connection string: mongodb://...
# ...
# 2021-03-02 17:24:13 INFO     [main.py:49]  Found the following namespaces on cluster_name: app-name. Namespaces: {"RetailApp": ["employees", "regions", "customers", "order-details", "orders", "suppliers", "employee-territories", "products", "territories", "shippers", "categories"], "TradeBlotter": ["trades"], "appname": ["info"], "social": ["people"], "test": ["movies"]}
# 2021-03-02 17:24:13 INFO     [dynamodb_helper.py:50]  Getting the watcher item by id: app-name::RetailApp.employees
# 2021-03-02 17:24:14 INFO     [dynamodb_helper.py:58]  Successfully found the watcher item for id: app-name::RetailApp.employees.
# ...
# 2021-03-02 17:24:15 INFO     [dynamodb_helper.py:19]  database: RetailApp, collections: ["employees", "regions", "customers", "order-details", "orders", "suppliers", "employee-territories", "products", "territories", "shippers", "categories"]
# ...
# 2021-03-02 17:24:16 INFO     [CollectionMigrator.py:124]  Inititated change stream on the db: RetailApp, collection: employees. Resume Token: {"_data": {"$binary": "W3sidG9rZW4iOiJcIjBcIiIsInJhbmdlIjp7Im1pbiI6IiIsIm1heCI6IkZGIn19XQ==", "$type": "00"}}
# 2021-03-02 17:24:16 INFO     [DatabaseMigrator.py:21]  Fetching collections from Database: appname
# 2021-03-02 17:24:16 INFO     [CollectionMigrator.py:137]  Watching for the changes on the cluster: app-name, db: RetailApp, collection: employees
# ...

Step 2b: Create a backup of the Cosmos DB cluster using the mongodump tool

In a new terminal session, export the data and indexes from the source Cosmos DB cluster using the mongodump tool (see the following code). The time it takes to perform the dump and the size of the dump depends on the data size of the source Cosmos DB cluster. Make sure that the disk device where you’re exporting the data has enough free disk space to hold the mongodump output. Other factors that may impact the overall execution time include the speed of the network between the EC2 instance and the source cluster, and the CPU/RAM resources of the EC2 instance. Update the values for <your-cosmosdb-connection-string>, <your-cosmos-db-server>, <port-number>, <your-username>, and <your-password> with values appropriate for your cluster.

export SOURCE_URI="<your-cosmosdb-connection-string>"
mongodump --host "<your-cosmos-db-server>" --port <port-number> --ssl -u "<your-username>" -p '<your-password>' --readPreference secondary

To minimize the impact of the migration to any workload on the source cluster’s primary, export the data using a secondary read preference. If your source cluster doesn’t have a secondary, exclude the --readPreference command line argument. If you have multiple collections to export, use the argument --numParallelCollections <number-of-cpu-cores> to dump multiple collections in parallel.

Step 2c: Create indexes on the target cluster using the Amazon DocumentDB Index Tool

Use the Amazon DocumentDB Index Tool to create the required indexes on the Amazon DocumentDB cluster.

Step 2d: Restore the backup on the target cluster using the mongorestore tool

Restore the mongodump data from Step 2b with the following code. If you have multiple collections to import, use the argument --numParallelCollections <number-of-cpu-cores> to restore multiple collections in parallel. Increasing the value of the –-numInsertionWorkersPerCollection argument to the number of vCPU cores on the Amazon DocumentDB cluster’s primary instance may increase the speed of the import. Update the values for <your-documentdb-server>, <number-of-vcpus>, <your-username>, and <your-password> with values appropriate for your cluster.

# Download the ca certificate for connecting to Amazon DocumentDB
curl -OL https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem
# Restore the dump data on the target Amazon DocumentDB cluster using mongorestore
mongorestore --host "<your-documentdb-server>" –-ssl –-sslCAFile rds-combined-ca-bundle.pem --username "<your-username>" --password '<your-password>' –-numInsertionWorkersPerCollection <number-of-vcpus> --noIndexRestore

Step 2e: Configure the event writer to apply the change feed data

The mongodump and mongorestore processes take time depending on the Cosmos DB and Amazon DocumentDB cluster configuration, and the size of the data and indexes being exported or imported. When the mongorestore step is complete, you should configure the migration application to start applying the change feed data on the target Amazon DocumentDB cluster.

The following commands help you configure the event writer to start processing the change feed data. Update the values for <your-cluster-name> with values appropriate for your cluster.

# Use Configure App to start applying the change feed data
cd ${BASE_DIR}/configure
python3 main.py --cluster-name '<your-cluster-name>' --event-writer start

You should observe an output similar to the following:

# Sample output: 
# 2021-03-02 17:30:04 INFO     [commandline_parser.py:27]  Command line arguments given: {"cluster_name": "app-name", "connection_string": null, "event_writer": "start", "status": false, "watch_status": false}
# 2021-03-02 17:30:04 INFO     [commandline_parser.py:46]  Validated Command line arguments are: {"cluster_name": "app-name", "connection_string": null, "event_writer": "start", "status": false, "watch_status": false, "command": "event_writer"}
# 2021-03-02 17:30:04 INFO     [main.py:41]  Starting to configure application components with commandline_options: {"cluster_name": "app-name", "connection_string": null, "event_writer": "start", "status": false, "watch_status": false, "command": "event_writer"}
# 2021-03-02 17:30:04 INFO     [application.py:40]  Setting the event writer status as start
# 2021-03-02 17:30:04 INFO     [application.py:49]  Starting to send SQS requests to queue: app-request-queue. Payload: {"cluster_name": "app-name", "component": "event_writer", "operation": "start"}
# 2021-03-02 17:30:05 INFO     [application.py:55]  Successfully completed sending SQS requests to queue: app-request-queue. Response: {'MD5OfMessageBody': '61dcb7532416d2b837e918bc74bdea9a', 'MessageId': '144febb8-d4e9-47b7-8e31-bdd9207ae7c0', 'ResponseMetadata': {'RequestId': '3906f72c-84f0-5e7f-b701-a1d0f1ddf39e', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '3906f72c-84f0-5e7f-b701-a1d0f1ddf39e', 'date': 'Tue, 02 Mar 2021 22:30:05 GMT', 'content-type': 'text/xml', 'content-length': '378'}, 'RetryAttempts': 0}}
# 2021-03-02 17:30:05 INFO     [application.py:45]  Successfully completed setting the event writer status as start
# 2021-03-02 17:30:05 INFO     [main.py:51]  Successfully completed configuring the application components.

Step 2f: Validate the target cluster is in sync with the source cluster

The Lambda functions from the CloudFormation stack start applying the change feeds on the target Amazon DocumentDB cluster in the order in which they happened on the source. You can observe the status of the migration application using the following command to see how far the target cluster is behind the source cluster:

# Watch the status of the migration
cd ${BASE_DIR}/configure
python3 main.py --cluster-name '<your-documentdb-server>' --watch-status

You should observe an output similar to the following:

# Sample output: 
# 2021-03-02 17:30:35 INFO     [commandline_parser.py:27]  Command line arguments given: {"cluster_name": "app-name", "connection_string": null, "event_writer": null, "status": false, "watch_status": true}
# 2021-03-02 17:30:35 INFO     [commandline_parser.py:46]  Validated Command line arguments are: {"cluster_name": "app-name", "connection_string": null, "event_writer": null, "status": false, "watch_status": true, "command": "watch_status"}
# 2021-03-02 17:30:35 INFO     [main.py:41]  Starting to configure application components with commandline_options: {"cluster_name": "app-name", "connection_string": null, "event_writer": null, "status": false, "watch_status": true, "command": "watch_status"}
# 2021-03-02 17:30:35 INFO     [application.py:64]  Status: {
#  "gap_in_seconds": 9,
#  "details": [
#   {
#    "cluster_name": "app-name",
#    "namespace": "social.people",
#    "batch_id": 673,
#    "created_timestamp": "2021-03-02T02:57:38.589018",
#    "processed_timestamp": "2021-03-02T02:57:47.299929",
#    "time_gap_in_seconds": 9
#   },
#   {
#    "cluster_name": "app-name",
#    "namespace": "appname.info",
#    "batch_id": 598,
#    "created_timestamp": "2021-03-02T02:57:41.889158",
#    "processed_timestamp": "2021-03-02T02:57:48.716314",
#    "time_gap_in_seconds": 7
#   }
#  ],
#  "current_time": "2021-03-02T22:30:29.562611",
#  "cluster_name": "app-name"
# }

The gap_in_seconds value represents the time gap between the source and target cluster operations. The time_gap_in_seconds value represents the time gap between the source and the target at the collection level. When the gap_in_seconds value is under 10 seconds, you can continue to the next step.

The cutover process

This process involves updating your source application to connect to the target Amazon DocumentDB cluster. Because the migration application has multiple components, the process is as follows:

  1. Stop the applications connected to Cosmos DB or place them in read-only mode.
  2. Wait for the configure --watch-status application to report the gap_in_seconds value is equal to 0 seconds.
  3. Stop the migrator-app and configure --watch-status applications by stopping the Python processes (press Ctrl+C).
  4. Stop the event writer by running the following commands:
    # Stop applying the change feed data
    cd ${BASE_DIR}/configure
    python3 main.py --cluster-name '<your-cluster-name>' --event-writer stop
  5. Restart the client applications with the connection string pointing to the Amazon DocumentDB cluster endpoint.

After you perform the cutover steps successfully, your database is fully migrated to the Amazon DocumentDB cluster with minimal downtime.

Troubleshooting tips

Errors while running migrator-app

If the configure application with --watch-status isn’t making any progress, try stopping and restarting the application using the following commands:

cd ${BASE_DIR}/configure
# Stop applying the change feed data
python3 main.py --cluster-name '<your-cluster-name>' --event-writer stop
# Wait for 2 minutes
# Start applying the change feed data
python3 main.py --cluster-name '<your-cluster-name>' --event-writer start

If that still doesn’t fix the issue, search for error text on the log group details page on the Amazon CloudWatch console to identify what’s causing the issue.

CloudFormation template stack doesn’t make progress

If the core-resources.yaml CloudFormation template doesn’t make progress while creating or deleting the resource, AWS CloudFormation may be facing some issues while creating or deleting the EventSourceMapping resources. Sign in to the AWS CloudTrail console and examine the event history for any FAILED events, such as the following:

2021-03-01 10:22:17 UTC-0500	lambdaEventMappingGapWatchRequest	CREATE_FAILED	An event source mapping with SQS arn (" arn:aws:sqs:us-east-2:xxxxxxxxxxxxxx:gap-watch-request-queue ") and function (" gap-watch-request-reader ") already exists. Please update or delete the existing mapping with UUID 28c11ac1-407d-4eec-8063-c2d1545e4f24 (Service: AWSLambda; Status Code: 409; Error Code: ResourceConflictException; Request ID: 3aa276bc-4124-4af0-a2e3-38a94a478997)

Capture the UUID from the log output and manually delete the resource using the AWS CLI:

# Change the --uuid value with the value you found in the logs
aws lambda delete-event-source-mapping --uuid 28c11ac1-407d-4eec-8063-c2d1545e4f24

For more information on identifying which resource is blocking progress, see  Why is my AWS CloudFormation stack stuck in the state CREATE_IN_PROGRESS, UPDATE_IN_PROGRESS, UPDATE_ROLLBACK_IN_PROGRESS, or DELETE_IN_PROGRESS?

Summary

In this post, I showed how you can perform a live migration of the Azure Cosmos DB API for MongoDB database to Amazon DocumentDB with minimal downtime. The Azure Cosmos DB to Amazon DocumentDB migration utility tool keeps the target Amazon DocumentDB cluster in sync with the changes on the source Cosmos DB cluster, and helps minimize the overall application downtime as you perform the migration. The source code referred to in this post is available in the GitHub repo.

If you have any questions or comments about post, please share them in the comments. If you have any feature requests for Amazon DocumentDB, email us at documentdb-feature-request@amazon.com.


About the author

Shyam Arjarapu is a Sr. Data Architect at Amazon Web Services and leads Worldwide Professional Services for Amazon DocumentDB. Shyam is passionate about solving customer problems and channels his energy into helping AWS professional service clients build highly scalable applications using Amazon DocumentDB. Before joining AWS, Shyam held a similar role at MongoDB and worked as Enterprise Solution Architect at JP Morgan & Chase, SunGard Financial Systems, and John Hancock Financial.

 

 

Ravi Tallury is a Principal Solution Architect at Amazon Web Services (AWS) and has over 25 years of experience in architecting and delivering IT solutions. Prior to joining AWS, he led solution architecture, enterprise architecture for automotive/life sciences verticals.