AWS Open Source Blog

Amazon EKS Control Plane Metrics with Prometheus

Prometheus + Amazon EKS

中文版 – Kubernetes core components provide a rich set of metrics you can use to observe what is happening in the Control Plane. You can see how many watchers are on each resource in the API Server, the number of audit trail events, the latency of the requests to the API Server, and much more. These metrics come from the Kubernetes API Server, Kubelet, Cloud Controller Manager, and the Scheduler. These components  expose “metrics” endpoints (which respond via HTTP) at /metrics with a text/plain content type. This post will walk you through how to get the API Server metrics from an Amazon Elastic Container Service for Kubernetes (EKS) cluster.

Prerequisites

You’ll first need to set up an Amazon EKS cluster. For this demo, we’ll use eksctl with the Cluster config file mechanism. Start by downloading these prerequisites:

With all the necessary tools installed, you can get started launching your EKS cluster. In this example, we’re deploying the cluster in us-east-2, AWS’ Ohio region; you can replace the AWS_REGION with any region that supports Amazon EKS.

Deploy Cluster

export AWS_REGION=us-east-2

Once you’ve exported the region, you can create the ClusterConfig as follows:

cat >cluster.yaml <<EOF
apiVersion: eksctl.io/v1alpha4
kind: ClusterConfig
metadata:
  name: control-plane-metrics
  region: us-east-2

nodeGroups:
  - name: ng-1
    desiredCapacity: 2
EOF

After the file has been created, create the cluster using the eksctl create cluster command:

eksctl create cluster -f cluster.yaml

This will take roughly 10 – 15 minutes to complete, then you’ll have an Amazon EKS cluster ready to go.

Raw metrics

Before you can visualize, monitor and alert on your metrics, you can first look at how these metrics endpoints are output:

kubectl get --raw /metrics

These metrics are output in a Prometheus format. Prometheus is a Cloud Native Computing Foundation (CNCF) graduated project. It can scan and scrape metrics endpoints within your cluster, and will even scan its own endpoint. The syntax for a Prometheus metric is:

metric_name {[ "tag" = "value" ]*} value

This allows you to set a metric_name, define tags on the metric which can be used for querying, and set a value. An example of this for the apiserver_request_count would be:

apiserver_request_count{client="kube-apiserver/v1.11.8 (linux/amd64) kubernetes/7c34c0d",code="200",contentType="application/vnd.kubernetes.protobuf",resource="pods",scope="cluster",subresource="",verb="LIST"} 7

This tells us that there have been 7 requests to the pods resource to LIST.

Next, we’ll set up Prometheus using helm.

Configuring Helm

Once the cluster is created, you can set up a helm locally so that you don’t need to have tiller running within your cluster. Follow the steps in the post Using Helm with Amazon EKS.

After you have completed those steps, you can deploy Prometheus.

Deploy Prometheus

First, create a Kubernetes namespace and use helm to deploy the stable/prometheus package:

kubectl create namespace prometheus
helm install stable/prometheus \ 
             --name prometheus \
             --namespace prometheus \
             --set alertmanager.persistentVolume.storageClass="gp2",server.persistentVolume.storageClass="gp2",server.service.type=LoadBalancer

Once that is installed, you can get the Load Balancer’s address by listing services:

kubectl get svc -o wide —namespace prometheus

With this Load Balancer address, you can navigate to it in your browser, which will load the Prometheus UI. From here you can go to StatusTargets – this page will show you the Control Plane nodes:

If you can see your nodes, you can go inspect some of the metrics. Navigate to Graph and in the drop-down – insert metric at cursor – select any metric starting with apiserver_ and click Execute. This will load the last-synced data from the API Server.

Now that you can see your metrics in the Console view, you can switch over to the Graph and visualize this data:

Teardown

If you deployed a cluster specifically to run this test and you’d like to tear it down, you can do so by first deleting the prometheus namespace, and then deleting the cluster:

kubectl delete namespace prometheus
eksctl delete cluster -f cluster.yaml

Using Prometheus, you can see what is happening within the Kubernetes API Server, and you can graph those metrics over time. You can also use Prometheus to set alerting rules which will populate the Alerts tab. With this helm chart, you can also deploy Alertmanager, which allows you to configure alerts based on whatever alerting rules you define. Try setting some rules on your own by modifying the prometheus-server configmap:

kubectl get configmap -n prometheus prometheus-server -o yaml

If you want to want to learn about using metrics in your own applications the same way you can in the Kubernetes API, check out the talk at KubeCon CloudNativeCon North America 2018Monitor the World: Meaningful Metrics for Containerized Apps & Clusters by Nicholas Turner and Nic Cope

Read more from Chris.

Chris Hein

Chris Hein

Chris Hein is a Sr. Developer Advocate for Kubernetes/EKS at Amazon Web Services. Before Amazon, Chris worked for a number of large and small companies like GoPro, Sproutling, & Mattel. Read More from Chris here https://aws.amazon.com/blogs/opensource/author/heichris/ and follow him at @christopherhein