AWS Database Blog

Backing up data with Amazon DocumentDB (with MongoDB compatibility)

Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. You can use the same MongoDB 3.6 application code, drivers, and tools to run, manage, and scale workloads on Amazon DocumentDB without worrying about managing the underlying infrastructure. As a document database, Amazon DocumentDB makes it easy to store, query, and index JSON data. This post introduces you to Amazon DocumentDB backup capabilities that enable you to optimize your backup strategy for your specific needs.

AWS built Amazon DocumentDB to uniquely solve your challenges around availability, reliability, durability, scalability, backup, and more. In doing so, we built several unique capabilities to remove undifferentiated heavy lifting and help reduce costs. With the recent launch of the capability to copy snapshots across AWS Regions (see Copying a Cluster Snapshot for more information), this post introduces you more broadly to Amazon DocumentDB backup capabilities.

A challenge that many developers face when self-managing a database is dealing with the heavy lifting of backing up their database. This often involves writing scripts, testing workflows, scheduling to minimize impact on production systems, and implementing archiving solutions. As a fully managed database service, Amazon DocumentDB makes backup simple and takes away the pain and complication so you can focus on building functionality for your customers.

Architecture

Amazon DocumentDB was built from the ground up to be a cloud-native database service. A key component of its architecture is the separation of storage and compute. The storage layer is responsible for automatically streaming backups to Amazon Simple Storage Service (Amazon S3), which is designed for 99.999999999% durability. Because the backups happen at the storage layer, the compute instances in your cluster don’t participate in backups. The outcome is that the Amazon DocumentDB architecture frees up your instances to do more work for your customers, and backups don’t affect database performance. The following diagram illustrates this architecture.

On by default

As a fully managed service, Amazon DocumentDB backups are turned on by default for all clusters. This gives you a daily snapshot during your backup window and a 24-hour retention window in which you can perform a point-in-time recovery (PITR). Optionally, you can extend the retention window up to 35 days. To offset the cost of the default 24-hour retention window, we give you up to the size of your cluster storage size worth of backup for free in a month. For example, if you have an 8 TB cluster, you receive 8 TB worth of backup for free. Because Amazon DocumentDB backups are on by default and provide at least one day of PITR capabilities, you can rest assured that you won’t deploy an application to production without backup enabled.

No impact backups

Backups have zero impact on the cluster performance and consume zero I/O or compute resources. The Amazon DocumentDB cloud-native architecture separates storage and compute components, and backups are continually streamed from the storage volume. Because backups don’t require the use of your compute instances, there is no impact on performance of compute instances, which eliminates the need to over-provision compute instances. Similarly, you can take a manual snapshot any time without worrying about impacting your application.

Point-in-time recovery up to 35 days

Amazon DocumentDB enables you to adjust your cluster’s backup window from 1–35 days. The backup retention period is a duration of time in which all modifications to the database are recorded to enable you to perform a PITR, down to the second, to a new cluster. Although the default retention period is 24 hours, you can adjust the retention period at any time, up to 35 days, via the Amazon DocumentDB console or the AWS Command Line Interface (AWS CLI). You can also reduce the retention period any time. For more information, see Restoring to a Point in Time.

Snapshots for long-term retention

In addition to performing PITRs, you can take manual snapshots and archive them for as long as you want. Manual snapshots that are within your backup retention period are free. When a snapshot is outside of the backup retention period, the cost of the full backup is $0.02 per GB per month (prices vary per Region, see Amazon DocumentDB pricing). You can also share the snapshots between accounts within a Region and copy snapshots within the same account across Regions for disaster recovery or to seed a development environment. For more information, see Copying a Cluster Snapshot.

Encrypting your backups

When you encrypt your cluster using an AWS Key Management Service (AWS KMS) key, Amazon DocumentDB automatically encrypts your continuous backup and snapshots using the same KMS key that you use to encrypt your cluster. You can also restore an unencrypted snapshot as an encrypted cluster.

Monitoring and optimizing for cost

You can use the Amazon CloudWatch metrics TotalBackupStorageBilled, SnapshotStorageUsed, and BackupRetentionPeriodStorageUsed to review and monitor the amount of storage your Amazon DocumentDB backups use. The use case in this section gives you a better give you an understanding of how backup pricing works and how to optimize for cost. For more information, see Monitoring Amazon DocumentDB with CloudWatch.

BackupRetentionPeriodStorageUsed represents the amount of backup storage used for storing continuous backups at the current time. This metric value depends on the size of the cluster volume and the number of changes you make during the retention period. However, for billing purposes, the metric doesn’t exceed the cluster volume size during the retention period. For example, if your cluster size is 100 GiB and your retention period is 4 days, the maximum value for BackupRetentionPeriodStorageUsed is 200 GiB (100 GiB + 100 GiB). Automatic snapshots don’t count against your backup storage usage. The amount of billed backup storage for data within your retention window depends on how many changes are in your cluster, such as updates and deletes. The billable backup storage in the backup retention period could be as little as 0 or as high as 100 GiB, even if the size of the changes to the cluster exceeds 100 GiB. The following diagram illustrates the retention periods of automatic snapshots.

SnapshotStorageUsed represents the amount of backup storage used for storing manual snapshots beyond the backup retention period. Manual snapshots taken within the retention period don’t count against your backup storage. The size of each snapshot is the size of the cluster volume at the time you take the snapshot.

The SnapshotStorageUsed value depends on the number of snapshots you keep and the size of each snapshot. For example, suppose that you have one snapshot outside the retention period and the cluster volume size was 100 GiB when that snapshot was taken. The amount of SnapshotStorageUsed is 100 GiB. The following diagram compares the retention periods for automatic and manual snapshots.

To tie the previous two examples together, the TotalBackupStorageBilled represents the sum of BackupRetentionPeriodStorageUsed and SnapshotStorageUsed, minus an amount of free backup storage equal to the size of cluster volume for 1 day. For example, if your cluster size is 100 GiB, you have a 4-day retention period, and you have one snapshot outside the retention period, the maximum TotalBackupStorageBilled is 200 GiB = 100 GiB (cluster size) + 0-100 GiB (BackupRetentionPeriodStorageUsed) + 100 GiB (SnapshotStorageUsed ) – 100 GiB (size of the cluster). These metrics are computed independently for each Amazon DocumentDB cluster on an hourly basis.

To optimize for cost, consider a scenario where you take daily manual snapshots each day over a 30-day period. If your retention window is 1 day, your BackupRetentionPeriodStorageUsed is going to be 2,900 GiB (29 * 100 GiB – the manual snapshot for the current day falls within the retention window and isn’t accounted for in your BackupRetentionPeriodStorageUsed).

In this scenario, you can optimize for cost by increasing your retention period to 30 days. If you extend your retention period to 30 days, your maximum backup is at most 100 GiB. If data in your cluster hasn’t changed over the 30 days retention period, you don’t pay anything for the backup. Furthermore, instead of daily manual snapshots, which are 29 times more expensive, you can now perform a PITR at any second with your 31-day backup window.

To control your costs, you can monitor the amount of storage consumed by continuous backups and manual snapshots that persist beyond the retention period. You can reduce the backup retention interval and remove manual snapshots when they’re no longer needed.

Dumping, restoring, importing, and exporting data

Apart from the fully managed capabilities that Amazon DocumentDB provides, you can also use the mongodump, mongorestore, mongoexport, and mongoimport utilities to back up or move data in and out of your Amazon DocumentDB cluster. For more information, see Dumping, Restoring, Importing, and Exporting Data.

Summary

Amazon DocumentDB provides you with a number of capabilities that help you backup and restore your data based on your use case. For more information, see Best Practices for Amazon DocumentDB. If you’re new to Amazon DocumentDB, see Getting Started with Amazon DocumentDB. If you are planning to migrate to Amazon DocumentDB, see Migrating to Amazon DocumentDB.

 

 


About the Author

 

Joseph Idziorek is a Principal Product Manager at Amazon Web Services.