AWS Database Blog

Monitoring metrics and setting up alarms on your Amazon DocumentDB (with MongoDB compatibility) clusters

Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. You can use the same MongoDB 4.0 application code, drivers, and tools to run, manage, and scale workloads on Amazon DocumentDB without having to worry about managing the underlying infrastructure. As a document database, Amazon DocumentDB makes it easy to store, query, and index JSON data.

Amazon DocumentDB gives you the ability to monitor over 50 Amazon CloudWatch metrics, including CPU utilization, BufferCacheHitRatio, read and write IOPS, MongoDB opcounters, and more. For a detailed list of the CloudWatch metrics, see Amazon DocumentDB Metrics. You can monitor these metrics at the cluster, instance, or role level (for more information about roles, see Managing Amazon DocumentDB Users). In addition, other services can use the metrics, such as Amazon Simple Notification Service (Amazon SNS) to set notification alarms when a metric breaches a predefined threshold.

This post walks through how to use CloudWatch metrics to monitor your Amazon DocumentDB cluster on the AWS Management Console. We also create an alarm and set up a notification through Amazon SNS to send an email when a metric breaches a predetermined threshold.

Prerequisites

To use the CloudWatch metrics, you must first provision an Amazon DocumentDB cluster. For instructions, see Getting Started with Amazon DocumentDB.

Monitoring a cluster’s status with Amazon DocumentDB

To view the status of your clusters, do the following:

  1. On the Amazon DocumentDB console, in the navigation pane, choose Clusters.
  2. In the Cluster identifier column, find the name of the cluster that you’re interested in.
  3. To find the status of the cluster, read across that row to the Status column, as shown in the following screenshot.

To find the status of the cluster, read across that row to the Status column, as shown in the following screenshot.

In addition to being active, the cluster status can be in a number of different states, including deleting, failing-over, and upgrading. For a complete list of status values and their descriptions, see Cluster Status Values.

Next, we look at how to check the status for individual instances within the cluster.

Monitoring an instance’s status with Amazon DocumentDB

To view the status of the individual instances in your cluster, do the following:

  1. On the Amazon DocumentDB console, in the navigation pane, choose Instances.
  2. In the Instance identifier column, find the name of the instance that you’re interested in.
  3. To find the status of the instance, read across that row to the Status collumn.

To find the status of the instance, read across that row to the Status

In addition to being available, the instance status can be in a number of different states, including backing-up, creating, and deleting. For a complete list of status values and their descriptions, see Instance Status Values.

You can view metrics at the cluster or instance level, and the instance must be in the available state before metrics can be viewed.

You can view metrics two different ways on the console: via the Amazon DocumentDB console or the CloudWatch console. CloudWatch allows you to set alarms and send notifications when metrics cross predetermined values, which is very useful when you’re busy doing something else.

Viewing metrics on the Amazon DocumentDB console

The easiest way to view cluster- and instance-level metrics is to access the monitoring blade on the Amazon DocumentDB console. To view these, complete the following steps:

  1. On the Amazon DocumentDB console, in the navigation pane, choose Instances.
  2. From the list of instances, choose the name of the instance that you want metrics for. (To view cluster level metrics, choose Clusters. Then, from the list of clusters, choose the name of the cluster you want metrics for).
  3. In the resulting instance summary page, choose the Monitoring tab to view graphical representations of your Amazon DocumentDB instance’s metrics.

Because a graph must be generated for each metric, it might take a few minutes for the CloudWatch graphs to populate.

To help with organization and readability, the metrics are categorized in the following way: Resource Utilization, Throughput, Latency, Operations and System.

The following image shows a screenshot of the VolumeBytesUsed and CPUUtilization metrics on the Amazon DocumentDB console.

The following image shows a screenshot of the VolumeBytesUsed and CPUUtilization metrics on the Amazon DocumentDB console.

Next, let’s look at how to use CloudWatch to monitor metrics and set alarms.

Viewing metrics on the CloudWatch console

A more powerful tool to view metrics is through the CloudWatch console. CloudWatch is a monitoring service for virtually all AWS services, not just Amazon DocumentDB. You can use CloudWatch to detect anomalous behavior in your environments, set alarms, visualize logs and metrics side by side, take automated actions, troubleshoot issues, and discover insights to keep your applications running smoothly.

To view CloudWatch metrics on the CloudWatch console, we set up a CloudWatch dashboard and populate it with metrics for storage and CPU utilization. The dashboard enables you to monitor resources in a single view and customize the display with graphs and components beyond the metrics we covered in the previous method. For more information, see Using Amazon CloudWatch Dashboards.

To create the dashboard, complete the following steps:

  1. On the CloudWatch console, in the navigation pane, choose Dashboards.
  2. Choose Create dashboard.
  3. Enter a name for the dashboard.
  4. Choose Create dashboard.
  5. Select a widget type to configure (widgets are components in the dashboard and are needed to display the metric values). For this post, choose Line.
  6. Choose Next.
  7. When prompted to choose a data source, choose Metrics.
  8. Choose Configure.
  9. On the All metrics tab, select DocDB.
  10. Choose a metric dimension (for example, Cluster Metrics).
  11. Choose the VolumeBytesUsed metric. 

VolumeBytesUsed is the amount of storage used by your cluster in bytes. The dashboard aggregates metrics from all clusters, so be sure to select the appropriate cluster.

  1. Choose Create widget.

You should see the VolumeBytesUsed metric on your dashboard.

Now we add the CPUUtilization metric to the dashboard. CPUUtilization tells us the percentage of CPU used by the cluster.

  1. Choose Add widget and repeat the previous steps, replacing the VolumeBytesUsed metric with CPUUtilization.

When you’re finished, your dashboard should look similar to the following screenshot.

When you’re finished, your dashboard should look similar to the following screenshot.

  1. Choose Save dashboard.

Congratulations! You now have a dashboard to quickly view your cluster’s storage capacity and CPU utilization.

You can also use AWS CloudFormation to provision resources quickly and consistently, and manage them throughout their lifecycles, by treating infrastructure as code.

In this link you can download an example CloudFormation template to build a CloudWatch dashboard for an Amazon DocumentDB cluster.

In the next section, we show how to create an alarm when one of the metrics crosses a threshold.

Creating an alarm notification with CloudWatch

In this section, we show how to create an alarm when the CPUUtilization metric goes above a certain threshold (for this post, 80%). We configure the alarm to email a message when this threshold is breached.

To set up the alarm, do the following:

  1. On the CloudWatch console, in the navigation pane, choose Alarms.
  2. Choose Create alarm.
  3. Choose Select metric.
  4. On the All metrics tab, select DocDB.
  5. Choose a metric dimension (for example, Cluster Metrics).
  6. Find the CPUUtilization metric and choose Select metric.
  7. Enter a meaningful metric name (for this post, we keep the name as CPUUtilization).
  8. For Statistic, keep the default value (Average).

Depending on what you want to measure, you could also choose Sum, Maximum, or Minimum.

  1. For Conditions, enter 80.

This is our numerical threshold, which triggers the alarm whenever the CPUUtilization metric crosses 80%.

  1. Choose Next.
  2. Choose Create new topic.
  3. Enter a unique topic name (for example, email-test).
  4. For Email endpoints that will receive the notification, enter the destination email.
  5. Choose Create topic. 

To subscribe to the notification, you need to check your destination email for an automated message from Amazon SNS. The email contains a link to confirm your subscription to the topic. After you confirm your subscription, you can accept automated email alerts. This feature protects end-users from receiving alerts they never signed up for.

  1. Choose Next.
  2. Enter a name and optional description.
  3. Choose Next.
  4. Review all the parameters to ensure they are correct.
  5. If you need to change anything, choose Edit.
  6. When the parameters are to your liking, choose Create alarm.

Your console should look similar to the following screenshot, with a list of alarms and their status and condition.

Your console should look similar to the following screenshot, with a list of alarms and their status and condition.

You now have a functioning alarm on the dashboard. Whenever the CPUUtilization metric crosses 80% within a 5-minute period, subscribed users receive a notification email.

Additional metrics to monitor

As a best practice, it’s recommended to set alarms on your overall service bill.

  1. On the CloudWatch console, choose Alarms.
  2. Choose Billing.
  3. Choose Create alarm.
  4. Follow the same process outlined in the preceding section to send a notification when the overall bill reaches 50% and 75% of your expected monthly spend.

The following table summarizes several other metrics that might be useful to track.

NameSpace Metric Description Recommendation Action to Take When Triggered
AWS/Billing EstimatedCharges Total charges for all AWS services. Set threshold for 50% and 75% of expected monthly spend. Determine the underlying service consuming charges and reduce consumption where applicable.
DocumentDB/Instance BufferCacheHitRatio The percentage of requests that are served by the buffer cache. Set alarm to under 95% for more than a 5-minute period. Consider scaling the instance vertically and replacing it with another instance containing more RAM.
DocumentDB/Instance IndexBufferCacheHitRatio The percentage of index requests that are served by the buffer cache. Set alarm to under 95% for more than a 5-minute period. Consider scaling the instance vertically and replacing it with another instance containing more RAM.
DocumentDB/Instance DatabaseConnections The number of connections open on an instance taken at a 1-minute frequency. 25,000 (30,000 is the maximum, which is different for every instance size). Check the application and deployment to understand why there are so many connections. Check driver configurations. Check if many containers or AWS Lambda functions have been created for some reason.
DocumentDB/Instance DatabaseCursors The maximum number of open cursors on an instance in a 1-minute period. 4,000 (4,560 is the maximum, which is different for every instance size). Check that applications are properly closing cursors explicitly when they are finished with them.
DocumentDB/Instance FreeableMemory The amount of available random access memory, in bytes. Under 10% over a 5-minute period. Consider scaling the instance vertically and replacing it with another instance containing more RAM.
DocumentDB/Instance CPUUtilization The percentage of CPU used by an instance. Over 80% over a 5-minute period. For read-heavy workloads, consider scaling the cluster horizontally to spread the work among different read replicas. For write-heavy workloads, consider scaling the primary instance vertically.
DocumentDB/Cluster DBClusterReplicaLagMaximum The maximum amount of lag, in milliseconds, between the primary instance and each Amazon DocumentDB instance in the cluster. Set alarm if replica is lagging beyond 5 seconds. Check to see if the replica instances are under stress (for example, high CPU, low available memory, or high read latency).
DocumentDB/Cluster DatabaseCursorsTimedOut The number of cursors that timed out in a 1-minute period. No alarm; this is informational. Check that applications are properly closing cursors explicitly when they are finished with them.
DocumentDB/Cluster VolumeBytesUsed The amount of storage used by your cluster in bytes. No alarm; this is informational. N/A
DocumentDB/Cluster VolumeWriteIOPs The average number of billed write I/O operations from a cluster volume, reported at 5-minute intervals. Set an alarm on a number that is over your typical maximum. N/A
DocumentDB/Cluster VolumeReadIOPs The average number of billed read I/O operations from a cluster volume, reported at 5-minute intervals. No alarm; this is informational. N/A
DocumentDB/Cluster Opcounters See Monitoring Amazon DocumentDB with CloudWatch for a complete list of all opcounters metrics. No alarm; this is informational. N/A

Conclusion

This post walked through two methods to view cluster- and instance-level metrics. The first method uses the Amazon DocumentDB metrics blade. It’s the easiest and fastest method to view metrics directly on the console, and supports over 50 individual- and cluster-level metrics. The other method uses CloudWatch. This method is slightly more involved, but gives you more insights into the metrics, including the ability to set alarms and notifications.

In future posts, we look at how to use AWS CloudTrail service to log auditing events. We also dive deeper into the Amazon DocumentDB profiler tool, which allows you to explore and debug slow running queries.

If you have any questions or comments about this post, please use the comments section. If you have any features requests for Amazon DocumentDB, email us at documentdb-feature-request@amazon.com.


About the Author

Ryan Thurston is a Senior Go-To-Market Specialist at AWS. He has been in the Information Technology space for over 20 years and enjoys helping customers solve real world business problems using technology. Ryan began his career as a software developer and wrote his first program in the BASIC programming language.