AWS Database Blog

How to use Amazon DocumentDB (with MongoDB compatibility) to build and manage applications at scale

Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. Data in Amazon DocumentDB is stored as JSON-like documents and maps naturally to how data is modeled in applications. This approach makes the storing, querying, and processing of data between your application and Amazon DocumentDB quick and intuitive.

The flexible, semistructured, and hierarchical nature of documents enables each document to evolve with your application’s needs. The document model lends itself particularly well to use cases such as catalogs, user profiles, and content management systems where each document can be unique and evolve over time. With the MongoDB API, you can write powerful, intuitive queries for your Amazon DocumentDB cluster. These queries can access data in a single document, across multiple documents, or in aggregations across documents.

Amazon DocumentDB is the latest purpose-built database offered by AWS. With Amazon DocumentDB on-demand pricing, you can pay by the hour with no long-term commitments or upfront fees. This frees you from the cost and complexity of planning and purchasing database capacity ahead of your needs. For more information about pricing and supported AWS Regions, see Amazon DocumentDB pricing.

In this blog post, I introduce Amazon DocumentDB and highlight some of its unique aspects. I also discuss how the architecture of Amazon DocumentDB helps you build and manage applications at scale.

Get started with Amazon DocumentDB

It’s easy to get started with Amazon DocumentDB (see the video following). You can spin up an Amazon DocumentDB cluster in minutes by using the AWS Management Console or AWS CLI or by using AWS CloudFormation. You can delete the cluster when you no longer need it. You then can use the same application code, drivers, and tools that you use with MongoDB today to start developing with Amazon DocumentDB.

For the sake of this post, let’s say you provision a cluster called myBlogCluster with three db.r4.large instances, as shown in the following screenshot.

Amazon DocumentDB provisions a storage layer and three instances in your virtual private cloud (VPC) that together form your cluster. The Amazon DocumentDB architecture separates storage and compute layers (as illustrated in the following architecture diagram), which lets each layer scale independently. This design also provides a highly available and scalable architecture for the cloud.

The storage layer is a unique, distributed, fault-tolerant, self-healing storage system that scales automatically up to 64 TB of data per cluster without any configuration needed. To provide high durability, the storage layer replicates six copies of your data across three AWS Availability Zones (two copies in each of three Availability Zones = six copies). You can encrypt your data at rest in the storage layer by using AWS Key Management Service (AWS KMS) and your own custom-managed keys.

Scaling your cluster

The instances in your cluster are your compute power and handle the processing of queries. With the three-instance cluster that I previously mentioned, you have a primary instance for both reads and writes and two replica instances that provide read scaling and high availability. You can scale your read capacity to millions of requests per second by adding up to 15 low-latency read replicas. Because instances are all about compute capacity and are not data bearing, adding a new replica only takes minutes regardless of the size of your data.

Further, because of the separate storage and compute layers, you can provision a highly durable cluster with a single instance. Durability is controlled by the storage layer. So, if your cluster has 1 or 16 instances, your data is always replicated six ways across three AWS Availability Zones (AZs). This means that two copies of your data are stored in three different AZs to ensure that your data is highly durable and available. Single-instance clusters can be useful for development and test scenarios at a lower cost (Scenario 1 in the following illustration). You can add additional instances in minutes, regardless of data size, for increased high availability and read scale. To learn more about architecting highly available applications, see the AWS Reliability Pillar.

Within a cluster, you can scale instances up and down within minutes. For example, suppose that you start with a three-instance cluster of db.r4.large instances. Here, you can scale up your instances to db.r4.xlarge with a few clicks in the console or the AWS CLI. For read scaling, you can quickly provision an additional replica to handle sudden increases in traffic (Scenario 2 in the preceding illustration). If you run analytics queries, you can provision a larger instance to handle the workload (Scenario 3 in the preceding illustration). The Amazon DocumentDB architecture gives you flexibility to configure a cluster that fits your specific use case and scale quickly to meet the demand of your application.

A fully managed service

With Amazon DocumentDB, you don’t need to worry about database management tasks, such as hardware provisioning, patching, setup, configuration, or backups. In addition, Amazon DocumentDB is integrated deeply with services such as Amazon VPC, AWS Identity and Access Management, Amazon CloudWatch, AWS CloudFormation, and AWS CloudTrail.

Amazon DocumentDB automatically and continuously monitors and backs up your database to Amazon S3, enabling point-in-time recovery (restore your table to any point in the preceding 35 days). Additionally, you can create consistent snapshot backups of your entire cluster at any time and keep those snapshots for as long as you’d like. You can share snapshot backups across accounts within the same region. Because the storage and compute layers are separated in the Amazon DocumentDB architecture, backups don’t affect the performance of your instances and don’t affect application performance. Instead, backups are handled by the storage layer, which continually streams your changes to Amazon S3 (see the following illustration).

As a fully managed service, Amazon DocumentDB provides safe defaults and multiple levels of security for your cluster. These include network isolation using Amazon VPC, encryption at rest using keys you create and control through AWS KMS, and encryption in transit using Transport Layer Security (TLS).

Migrating to Amazon DocumentDB

If you are migrating to Amazon DocumentDB, you can use AWS Database Migration Service (DMS) for free (for six months). This enables you to easily migrate your on-premises or Amazon EC2 MongoDB databases (either replica sets or sharded clusters) with virtually no downtime. To get started migrating to Amazon DocumentDB, see Migrating to Amazon DocumentDB and Walkthrough: Migrating from MongoDB to Amazon DocumentDB.


About the Author

Joseph Idziorek is a Principal Product Manager at Amazon Web Services.