AWS Open Source Blog

CNI Metrics Helper

The Amazon Elastic Container Service for Kubernetes (EKS) uses the VPC CNI plugin for pod networking. The plugin runs as a DaemonSet and is responsible for assigning an IP address to pods.

When managing an EKS cluster, it may be important to know how many IP addresses have been assigned and how many are available. The CNI metrics helper can help you track these metrics over time, troubleshoot/diagnose issues related to IP assignment and reclamation, and provide insights for capacity planning. The metrics helper also serves as an example for how to extract data from worker nodes, transform it into the proper format, and load it into CloudWatch (as custom metrics) where it can be visualized in a dashboard.

When a worker node is provisioned, the plugin automatically allocates a pool of secondary IP addresses from the node’s subnet to the primary ENI (eth0). This pool of IPs is known as the warm pool, and its size is determined by the worker node’s instance type. For example, a c4.large instance can support three ENIs and nine IPs per ENI; the number of IPs available for a given pod is one less than the maximum (of ten) because one of the IPs is reserved for the ENI itself. You can learn more about the maximum number of ENIs and secondary IP addresses that an instance can support.

Starting with v1.1 of the CNI plugin, you can configure the initial size of the warm pool. You can also configure how many ENIs get attached to the worker node when it’s provisioned. For further information, see Add WARM_IP_TARGET support and Add config option for number of ENIs get preallocated.

As the pool of IP addresses is depleted, the plugin will automatically attach another ENI to the instance and allocate another set of secondary IP addresses to that ENI. This continues until the node is no longer capable of supporting additional ENIs.

As part of operating the cluster, it may be important to monitor:

  • The maximum number of ENIs the cluster can support
  • How many ENIs have been allocated to pods
  • The number of IP addresses currently assigned to pods
  • The total and maximum number of IP addresses available
  • The number of ipamD errors

Included in the v1.1 release of the CNI plugin is a manifest for a metrics helper that can output this information to a log file or as a series of CloudWatch metrics. The rest of this post describes how to configure the metrics helper to output the metrics to CloudWatch, and how to create a dashboard from those metrics.

Installing and Configuring the Metrics Helper

Get the source code for the metrics helper.

The metrics helper is programmed to write metrics to CloudWatch every 30 seconds. If you want to decrease the frequency at which metrics are written to CloudWatch, you can increase the pullInterval variable in the cni-metrics-helper.go file. Please realize that changes to the source code will require you to recompile the code, rebuild, and push the container image to a registry, and update the metrics helper manifest with the new image. Alternatively, you can set the value of the USE_CLOUDWATCH environment variable to “no” from the manifest below and pull the container logs periodically, e.g. kubectl logs cni-metrics-xxxx.

Applying the Metrics Helper Manifest

To install the metrics helper, copy and paste following into a terminal:

cat > cni_metrics_helper.yaml << EOF
---
apiVersion: rbac.authorization.k8s.io/v1
# kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: cni-metrics-helper
rules:
- apiGroups: [""]
  resources:
  - nodes
  - pods
  - pods/proxy
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs: ["list", "watch", "get"]
- apiGroups: ["extensions"]
  resources:
  - daemonsets
  - deployments
  - replicasets
  verbs: ["list", "watch"]
- apiGroups: ["apps"]
  resources:
  - statefulsets
  verbs: ["list", "watch"]
- apiGroups: ["batch"]
  resources:
  - cronjobs
  - jobs
  verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
  resources:
  - horizontalpodautoscalers
  verbs: ["list", "watch"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cni-metrics-helper
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
# kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: cni-metrics-helper
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cni-metrics-helper
subjects:
- kind: ServiceAccount
  name: cni-metrics-helper
  namespace: kube-system
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: cni-metrics-helper
  namespace: kube-system
  labels:
    k8s-app: cni-metrics-helper
spec:
  selector:
    matchLabels:
      k8s-app: cni-metrics-helper
  template:
    metadata:
      labels:
        k8s-app: cni-metrics-helper
    spec:
      serviceAccountName: cni-metrics-helper
      containers:
      - image: 694065802095.dkr.ecr.us-west-2.amazonaws.com/cni-metrics-helper:0.1.1
        imagePullPolicy: Always
        name: cni-metrics-helper
        env:
          - name: USE_CLOUDWATCH
            value: "yes"
EOF

To apply this manifest to your cluster, copy and paste the following into your terminal window:

kubectl apply -f cni_metrics_helper.yaml

IAM policy for Metrics Helper calls to Cloudwatch

After you have applied the manifest, you will need to create an IAM policy that grants the metrics helper permission to call the PutMetricsData API.

This policy can be added to the role assigned to the worker nodes or to a role that’s assigned to the pod by kube2iam.

Copy and paste the following into a terminal:

cat > allow_put_metrics_data.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": "cloudwatch:PutMetricData",
      "Resource": "*"
    }
  ]
}
EOF

To add this policy to IAM, copy and paste the following into your terminal (requires jq):

POLICY_ARN=$(aws iam create-policy --policy-name CNIMetricsHelperPolicy --description "Grants permission to write metrics to CloudWatch" --policy-document file://allow_put_metrics_data.json | jq -r '.Policy.Arn')

Next, get the ARN of the instance profile associated with your worker nodes. The quickest way to do this is to run the following commands:

INSTANCE_ID=$(kubectl get nodes -o jsonpath={'.items[0].spec.externalID'}) 
INSTANCE_PROFILE=$(aws ec2 describe-iam-instance-profile-associations --filters Name=instance-id,Values=$INSTANCE_ID --query 'IamInstanceProfileAssociations[0].IamInstanceProfile.Arn' --region [region] | cut -d "/" -f 2 | sed 's/\"//')
ROLE_NAME=$(aws iam get-instance-profile --instance-profile-name $INSTANCE_PROFILE --region [region] | jq '.InstanceProfile.Roles[0].RoleName' | sed 's/\"//g')
echo $ROLE_NAME

On Kubernetes version 1.12 and above, change the value of INSTANCE_ID to $(kubectl get node -o jsonpath='{.items[0].metadata.labels.\"alpha.eksctl.io/instance-id\"}')

To attach the policy you created in the previous step to the role assigned to your workers, run the following command:

aws iam attach-role-policy --policy-arn $POLICY_ARN --role-name $ROLE_NAME

Now that we’ve finished installing the metrics helper, we’ll move on to configuring CloudWatch to display our EKS worker metrics.

Configuring CloudWatch for EKS Worker Metrics

From the Services dropdown in the AWS web console, select CloudWatch to bring up the CloudWatch console. From within the CloudWatch console, select Metrics on the left menu panel.

If the metrics helper was set up and configured correctly in the previous steps, you should see a separate namespace for Kubernetes metrics in the CloudWatch metrics console. Clicking on the Kubernetes namespace reveals the CLUSTER_ID dimension and the seven metrics associated with it.

Configuring CloudWatch for EKS Worker Metrics

Configuring CloudWatch for EKS Worker Metrics

Configuring CloudWatch for EKS Worker Metrics

Creating a CloudWatch Dashboard

Follow these instructions to create a CloudWatch dashboard from the helper metrics:

  • Click the CLUSTER_ID checkbox
  • Click the Actions button and select Add to Dashboard from the drop down

Creating a CloudWatch Dashboard

  • Under Select a Dashboard, click the Create New link
  • Enter a name for the dashboard in the Dashboard Name field and then click the checkmark directly to the right of the field to accept the name
  • Under Select a Widget Type, select Number
  • Click Add to Dashboard when finished

When you click on the Dashboard link in the left hand navigation, you will see link to your newly created dashboard.

Creating a CloudWatch Dashboard

To get CNI metrics from a particular node, use the SSM RunCommand to execute /opt/cni/bin/aws-cni-support.sh against EKS optimized instances or ssh to the instance and run the command manually.

Participate!

As always, we welcome your ideas and feedback about how we can improve the optics that are available for an EKS cluster. And special thanks to Liwen Wu, author of the CNI metrics helper.

Jeremy Cowan

Jeremy Cowan

Jeremy Cowan is a Specialist Solutions Architect for containers at AWS, although his family thinks he sells "cloud space". Prior to joining AWS, Jeremy worked for several large software vendors, including VMware, Microsoft, and IBM. When he's not working, you can usually find on a trail in the wilderness, far away from technology.