AWS Cloud Operations Blog

Monitor EBS Detailed Performance Statistics with Amazon Managed Service for Prometheus

Today we are excited to announce that you can now easily ingest Amazon EBS detailed performance statistics from your Amazon Elastic Kubernetes Service (Amazon EKS) workloads into an Amazon Managed Service for Prometheus workspace. We recently announced the availability of EBS detailed performance statistics, which gives you real-time visibility into the performance of your EBS storage volumes. Prior to this release, customers who needed granular volume-level observability had to use a patchwork of system-level tools. EBS detailed performance statistics gives you access to 11 high-performance metrics at sub-minute granularity. You can use these statistics to better understand the health and performance of your Kubernetes storage. In this blog post, we will demonstrate how you can enable detailed performance statistics on EBS volumes in your EKS clusters and send this telemetry data to your Prometheus workspace.

Getting started

There are several pre-requisites needed for collecting the new EBS detailed performance statistics.

  1. Install the AWS CLI and eksctl command line tools.
  2. You must have an Amazon EKS cluster. If you don’t have an EKS cluster, you can follow this guide to get started. You can also create a cluster via the CLI as follows (where <clusterName> is the name of the cluster you want to create):
    eksctl create cluster --name <clusterName>
  3. The cluster must have OIDC enabled so that you can map IAM roles to Kubernetes service accounts. The easiest way to do this is via eksctl. You can run the following command on an EKS cluster to enable the IAM OIDC Provider, where <clusterName> is the name of your EKS cluster:
    eksctl utils associate-iam-oidc-provider --cluster=<clusterName>
  4. Create an IAM role so the CSI driver for Amazon EBS has the correct permissions for Kubernetes to access storage volumes. You can do this by running the following command, where <clusterName> is the name of the EKS cluster:
    eksctl create iamserviceaccount \
    	--name ebs-csi-controller-sa \
    	--namespace kube-system \
    	--cluster <clusterName> \
    	--role-name AmazonEKS_EBS_CSI_DriverRole \
    	--role-only \
    	--attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
    	--approve
  5. Install the Amazon EBS CSI Driver. This can be done either via the EKS console or the AWS CLI. You can use the following command, where <clusterName> is the name of the EKS cluster and <roleArn> is the ARN of the AmazonEKS_EBS_CSI_DriverRole created in the previous step.
    eksctl create addon --name aws-ebs-csi-driver --cluster <clusterName> --service-account-role-arn <roleArn>
  6. Ensure you have an Amazon Managed Service for Prometheus workspace. If you don’t already have a workspace, you can follow this guide to get started. You can also create a cluster via the CLI as follows (where <clusterWorkspaceName> is the name of the workspace you want to create):
    aws amp create-workspace –-alias <clusterWorkspaceName>

  7. Ensure you have an AWS managed collector configured as part of your EKS cluster. You can create a scraper as part of your EKS cluster creation, or you can create a scraper via the AWS API or AWS CLI for existing clusters. The blog Amazon Managed Service for Prometheus collector provides agentless metric collection for Amazon EKS provides more details on how to configure the agentless metric collector for new and existing clusters. You can view existing and create new scrapers from the Observability tab of the EKS cluster (see figure 1).

The EKS cluster information page, where the Observability tab is selected. In the details, an Agentless Prometheus scraper is in the creating stage.Figure 1: Agentless Prometheus scraper in the EKS cluster

Enabling EBS detailed performance statistics in EKS

Once you have these pre-requisites configured, enabling collection of the EBS detailed performance statistics consists of updating the Amazon EBS CSI Driver and enabling metrics for the node plugin.

First, check your current add-on version:

eksctl get addon --cluster <clusterName>

The command should show that v1.37.0 or later is available for the aws-ebs-csi-driver add-on. To enable metrics collection, you will need to update the add-on with advanced configuration. Create a file named values.yaml with the following content:

node:
  enableMetrics: true

Then update the add-on using the AWS CLI:

aws eks update-addon \
--cluster-name <clusterName> \
--addon-name aws-ebs-csi-driver \
--resolve-conflicts OVERWRITE \
--configuration-values file://values.yaml

The key configuration here is node.enableMetrics: true, which enables the collection of the detailed performance statistics. For more information about EKS add-on advanced configuration options, see the Amazon EKS Add-ons: Advanced configuration blog.

After updating the add-on with metrics enabled, the EBS detailed performance statistics will be automatically scraped and sent to your Prometheus workspace. You can verify this by checking the metrics endpoint.

Validating the scraped data

You can validate that the metrics are being scraped by checking the metrics endpoint. First, ensure that you have port-forwarded a CSI node pod (where <ebs-csi-node-4cm75> is the name of our CSI node pod, but your naming will be different):

kubectl port-forward <ebs-csi-node-4cm75> 3302:3302 -n kube-system

Then curl the endpoint:

curl 127.0.0.1:3302/metrics

This will yield output similar to the following sample:

# HELP nvme_collector_scrapes_total Total number of NVMe collector scrapes
# TYPE nvme_collector_scrapes_total counter nvme_collector_scrapes_total{instance_id="i-XXXXXXXXXXXXXXXXX"} 2
...

You can also use CloudWatch metrics to monitor your workspace to validate that new data is being scraped.

To visualize the scraped data, set up an Amazon Managed Grafana workspace. Add the Prometheus workspace created previously as a data source in Amazon Managed Grafana. See figure 2.

The AWS Data Sources tab of the Amazon Managed Grafana workspace displays a list of supported AWS data sources. At the bottom of the list is Amazon Managed Service for Prometheus.Figure 2: Selecting Amazon Managed Service for Prometheus as a data source in Amazon Managed Grafana

To validate the EBS detailed performance statistics are being sent to the Prometheus workspace, login to the Grafana instance created previously to visualize the data. Create a dashboard and in the metrics browser search for the relevant volume metrics. See figure 3.

A visualization within Amazon Managed Grafana which shows the metric nvme_read_bytes_total. The graph shows a flat line to the 7 second mark, then a jump, followed by another flat line to the end of the visualization.Figure 3: Visualizing the nvme_read_bytes_total metric

Next steps

Once the Amazon EBS CSI driver in your EKS cluster has been updated to the latest version, these metrics will become available in your cluster. The AWS managed collector built into Amazon EKS makes it straightforward to begin collecting these detailed performance statistics. While these metrics are available free of charge, your Prometheus workspace is billed on the number of metrics ingested and the amount of storage used.

In this blog post, we demonstrated how to begin to take advantage of EBS detailed performance statistics within your EKS clusters. Using these new metrics, you can better understand the performance of your latency-sensitive EKS workloads. These metrics give you real-time visibility into your volume’s I/O performance and can inform you when performance is impacted by a volume or when throughput limits are being exceeded. To get started, upgrade the EBS CSI Driver in your EKS clusters so you can begin to take advantage of these new metrics today.

About the authors

Mike George

Mike George

Mike George is a Principal Solutions Architect at Amazon Web Services (AWS) based in Salt Lake City, Utah. He enjoys helping customers solve their technology problems. His interests include software engineering, security, artificial intelligence (AI), and machine learning (ML).

Eddie Torres

Eddie is a member of the Amazon EBS team and a maintainer of the AWS EBS CSI driver. He is an active contributor to the Kubernetes community, where he participates as a member of the Storage Special Interest Group (SIG-Storage).

Girish B

Girish B is a Senior Solutions Architect at Amazon Webservices India Pvt Ltd based in Bengaluru. Girish works with many Independent Software Vendors (ISV) customers and enables them to design and architect innovative solutions on AWS. Outside work, he enjoys long distance running, reading and spending time with his family.