AWS Cloud Operations Blog
Monitor EBS Detailed Performance Statistics with Amazon Managed Service for Prometheus
Today we are excited to announce that you can now easily ingest Amazon EBS detailed performance statistics from your Amazon Elastic Kubernetes Service (Amazon EKS) workloads into an Amazon Managed Service for Prometheus workspace. We recently announced the availability of EBS detailed performance statistics, which gives you real-time visibility into the performance of your EBS storage volumes. Prior to this release, customers who needed granular volume-level observability had to use a patchwork of system-level tools. EBS detailed performance statistics gives you access to 11 high-performance metrics at sub-minute granularity. You can use these statistics to better understand the health and performance of your Kubernetes storage. In this blog post, we will demonstrate how you can enable detailed performance statistics on EBS volumes in your EKS clusters and send this telemetry data to your Prometheus workspace.
Getting started
There are several pre-requisites needed for collecting the new EBS detailed performance statistics.
- Install the AWS CLI and eksctl command line tools.
- You must have an Amazon EKS cluster. If you don’t have an EKS cluster, you can follow this guide to get started. You can also create a cluster via the CLI as follows (where
<clusterName>
is the name of the cluster you want to create): - The cluster must have OIDC enabled so that you can map IAM roles to Kubernetes service accounts. The easiest way to do this is via eksctl. You can run the following command on an EKS cluster to enable the IAM OIDC Provider, where
<clusterName>
is the name of your EKS cluster: - Create an IAM role so the CSI driver for Amazon EBS has the correct permissions for Kubernetes to access storage volumes. You can do this by running the following command, where
<clusterName>
is the name of the EKS cluster: - Install the Amazon EBS CSI Driver. This can be done either via the EKS console or the AWS CLI. You can use the following command, where
<clusterName>
is the name of the EKS cluster and<roleArn>
is the ARN of theAmazonEKS_EBS_CSI_DriverRole
created in the previous step. - Ensure you have an Amazon Managed Service for Prometheus workspace. If you don’t already have a workspace, you can follow this guide to get started. You can also create a cluster via the CLI as follows (where
<clusterWorkspaceName>
is the name of the workspace you want to create): - Ensure you have an AWS managed collector configured as part of your EKS cluster. You can create a scraper as part of your EKS cluster creation, or you can create a scraper via the AWS API or AWS CLI for existing clusters. The blog Amazon Managed Service for Prometheus collector provides agentless metric collection for Amazon EKS provides more details on how to configure the agentless metric collector for new and existing clusters. You can view existing and create new scrapers from the Observability tab of the EKS cluster (see figure 1).
Figure 1: Agentless Prometheus scraper in the EKS cluster
Enabling EBS detailed performance statistics in EKS
Once you have these pre-requisites configured, enabling collection of the EBS detailed performance statistics consists of updating the Amazon EBS CSI Driver and enabling metrics for the node plugin.
First, check your current add-on version:
The command should show that v1.37.0 or later is available for the aws-ebs-csi-driver add-on. To enable metrics collection, you will need to update the add-on with advanced configuration. Create a file named values.yaml
with the following content:
Then update the add-on using the AWS CLI:
The key configuration here is node.enableMetrics: true
, which enables the collection of the detailed performance statistics. For more information about EKS add-on advanced configuration options, see the Amazon EKS Add-ons: Advanced configuration blog.
After updating the add-on with metrics enabled, the EBS detailed performance statistics will be automatically scraped and sent to your Prometheus workspace. You can verify this by checking the metrics endpoint.
Validating the scraped data
You can validate that the metrics are being scraped by checking the metrics endpoint. First, ensure that you have port-forwarded a CSI node pod (where <ebs-csi-node-4cm75>
is the name of our CSI node pod, but your naming will be different):
Then curl the endpoint:
This will yield output similar to the following sample:
You can also use CloudWatch metrics to monitor your workspace to validate that new data is being scraped.
To visualize the scraped data, set up an Amazon Managed Grafana workspace. Add the Prometheus workspace created previously as a data source in Amazon Managed Grafana. See figure 2.
Figure 2: Selecting Amazon Managed Service for Prometheus as a data source in Amazon Managed Grafana
To validate the EBS detailed performance statistics are being sent to the Prometheus workspace, login to the Grafana instance created previously to visualize the data. Create a dashboard and in the metrics browser search for the relevant volume metrics. See figure 3.
Figure 3: Visualizing the nvme_read_bytes_total metric
Next steps
Once the Amazon EBS CSI driver in your EKS cluster has been updated to the latest version, these metrics will become available in your cluster. The AWS managed collector built into Amazon EKS makes it straightforward to begin collecting these detailed performance statistics. While these metrics are available free of charge, your Prometheus workspace is billed on the number of metrics ingested and the amount of storage used.
In this blog post, we demonstrated how to begin to take advantage of EBS detailed performance statistics within your EKS clusters. Using these new metrics, you can better understand the performance of your latency-sensitive EKS workloads. These metrics give you real-time visibility into your volume’s I/O performance and can inform you when performance is impacted by a volume or when throughput limits are being exceeded. To get started, upgrade the EBS CSI Driver in your EKS clusters so you can begin to take advantage of these new metrics today.