How do I enable Container Insights metrics on an EKS cluster?

Last updated: 2022-05-13

I want to configure Amazon CloudWatch Container Insights and see my Amazon Elastic Kubernetes Service (Amazon EKS) cluster metrics. How can I do this?

Short description

When used with Amazon EKS, Container Insights uses a containerized version of either the CloudWatch agent or AWS Distro for OpenTelemetry (ADOT) collector to find all the containers running in a cluster. It then collects performance data at every layer of the performance stack. Container Insights collects data such as performance log events using an embedded metric format. It then sends this data to CloudWatch Logs under the /aws/containerinsights/cluster-name/performance log group. From this data, CloudWatch creates aggregated metrics at the cluster, node, and pod levels. Container Insights also supports collecting metrics from clusters deployed on AWS Fargate for Amazon EKS. For more information on Container Insights, see Using Container Insights.

Note: Container Insights is supported only on Linux instances. Amazon provides a CloudWatch agent container image on Amazon Elastic Container Registry (Amazon ECR). For more information, see cloudwatch-agent on Amazon ECR.

Resolution

Prerequisites

Before following these steps, review the prerequisites:

  • Your EKS cluster is running with nodes in the Ready state, and the kubectl command is installed and running.
  • The AWS Identity and Access Management (IAM) managed CloudWatchAgentServerPolicy is in place to enable your Amazon EKS worker nodes to send metrics and logs to CloudWatch. You can do this by attaching a policy to the IAM role of your worker nodes. Or, use an IAM role for service accounts for the cluster, and attach the policy to this role. For more information, see IAM roles for service accounts.
  • You are running a cluster that supports Kubernetes version 1.18 or higher. This is a requirement of Container Insights for EKS Fargate. You have also defined a Fargate profile to schedule pods on Fargate.
  • The EKS pod execution role is in place to allow components that run on Fargate infrastructure to make calls to AWS APIs on your behalf. For example, pulling container images from Amazon ECR.

Set up Container Insights metrics on your EKS EC2 cluster using the CloudWatch agent

The CloudWatch agent or AWS Distro for OpenTelemetry creates a log group named aws/containerinsights/Cluster_Name/performance. They then send the performance log events to this log group.

To set up Container Insights to collect metrics, follow these steps that deploy the CloudWatch agent container image as a Daemonset from Docker Hub as an anonymous user, by default. This pull might be subject to a rate limit.

1.    Create a namespace called amazon-cloudwatch if you don't have one already:

kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cloudwatch-namespace.yaml

2.    Create a service account for the CloudWatch agent named cloudwatch-agent:

kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-serviceaccount.yaml

3.    Create a configmap as a configuration file for the CloudWatch agent:

ClusterName=<my-cluster-name>
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-configmap.yaml | sed 's/cluster_name/'${ClusterName}'/' | kubectl apply -f -

Note: Replace my-cluster-name with the name of your EKS cluster. To further customization the CloudWatch agent configuration, see Create a ConfigMap for the CloudWatch agent.

4.    Optional: To pull the CloudWatch agent from the Amazon Elastic Container Registry, patch the cloudwatch-agent DaemonSet:

kubectl patch ds cloudwatch-agent -n amazon-cloudwatch -p \
 '{"spec":{"template":{"spec":{"containers":[{"name":"cloudwatch-agent","image":"public.ecr.aws/cloudwatch-agent/cloudwatch-agent:latest"}]}}}}'

Note: Cloudwatch-agent docker image on ECR supports the ARM and AMD64 architectures. Replace the latest image tag based on the image version and architecture. For more information, see images tags cloudwatch-agent on Amazon ECR.

5.    For IAM roles for service accounts, create an OIDC provider and an IAM role and policy. Then, associate the IAM role to the cloudwatch-agent service account. Replace ACCOUNT_ID and IAM_ROLE_NAME with AWS Account ID and the IAM role used for the service accounts.

kubectl annotate serviceaccounts cloudwatch-agent -n amazon-cloudwatch "eks.amazonaws.com/role-arn=arn:aws:iam::ACCOUNT_ID:role/IAM_ROLE_NAME"

Troubleshoot CloudWatch agent

1.    Run this command to get the list of pods:

kubectl get pods -n amazon-cloudwatch

2.    Run this command and check the events at the bottom of the output:

kubectl describe pod pod-name -n amazon-cloudwatch

3.    Run this command to check the logs:

kubectl logs pod-name -n amazon-cloudwatch

4.    If you see a CrashLoopBackOff error for the CloudWatch agent, then make sure that your IAM permissions are set correctly.

For more information, see Verify prerequisites.

Delete CloudWatch Agent

Use these commands to delete the CloudWatch agent:

kubectl delete -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cloudwatch-namespace.yaml
kubectl delete -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-serviceaccount.yaml
ClusterName=<my-cluster-name>
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-configmap.yaml | sed 's/cluster_name/'${ClusterName}'/' | kubectl delete -f -

Set up Container Insights metrics on your EKS EC2 cluster using ADOT

1.    Run this command to deploy the ADOT collector as a DaemonSet. For more customizations, see Container Insights EKS infrastructure metrics.

curl https://raw.githubusercontent.com/aws-observability/aws-otel-collector/main/deployment-template/eks/otel-container-insights-infra.yaml | kubectl apply -f -

2.    Confirm that the collector is running:

kubectl get pods -l name=aws-otel-eks-ci -n aws-otel-eks

3.    Optional: The aws-otel-collector image is pulled from Docker Hub as an anonymous user, by default. This pull might be subject to a rate limit. To pull the aws-otel-collector docker image on Amazon ECR, patch aws-otel-eks-ci DaemonSet:

kubectl patch ds aws-otel-eks-ci -n aws-otel-eks -p \
'{"spec":{"template":{"spec":{"containers":[{"name":"aws-otel-collector","image":"public.ecr.aws/aws-observability/aws-otel-collector:latest"}]}}}}'

Note: Cloudwatch-agent docker image on ECR supports the ARM and AMD64 architectures. Replace the latest image tag based on the image version and architecture. For more information, see images tags cloudwatch-agent on Amazon ECR.

5.    Optional: For IAM roles for service accounts, create an OIDC provider and an IAM role and policy. Then, associate the IAM role to aws-otel-sa Service Account. Replace ACCOUNT_ID and IAM_ROLE_NAME with AWS Account ID and the IAM role used for service accounts.

kubectl annotate serviceaccounts aws-otel-sa -n aws-otel-eks "eks.amazonaws.com/role-arn=arn:aws:iam::ACCOUNT_ID:role/IAM_ROLE_NAME"

Delete ADOT

To delete AWS Distro for OpenTelemetry, run this command:

curl https://raw.githubusercontent.com/aws-observability/aws-otel-collector/main/deployment-template/eks/otel-container-insights-infra.yaml |
kubectl delete -f -

Set up Container Insights metrics on an EKS Fargate cluster using ADOT

For applications that run on Amazon EKS and AWS Fargate, you can use the ADOT to set up Container Insights. EKS Fargate networking architecture doesn’t allow pods to directly reach the kubelet on the worker to retrieve resource metrics. The ADOT Collector calls the Kubernetes API server to proxy the connection to the kubelet on a worker node. It then collects kubelet’s advisor metrics for workloads on that node.

Note: A single instance of ADOT Collector is not sufficient to collect resource metrics from all of the nodes in a cluster.

The ADOT Collector sends these metrics to CloudWatch for every workload that runs on EKS Fargate:

  • pod_cpu_utilization_over_pod_limit
  • pod_cpu_usage_total
  • pod_cpu_limit
  • pod_memory_utilization_over_pod_limit
  • pod_memory_working_set
  • pod_memory_limit
  • pod_network_rx_bytes
  • pod_network_tx_bytes

Each metric is associated with the following dimension sets, and collected under the CloudWatch namespace named ContainerInsights.

  • ClusterName, LaunchType
  • ClusterName, Namespace, LaunchType
  • ClusterName, Namespace, PodName, LaunchType

For more details, visit the Container Insights EKS Fargate page.

Follow these steps to deploy ADOT in your EKS Fargate:

1.    Associate a Kubernetes service account with an IAM Role. Create an IAM role named EKS-ADOT-ServiceAccount-Role associated with a Kubernetes service account named adot-collector. Be sure to change the CLUSTER_NAME and REGION variables. This helper script requires eksctl.

#!/bin/bash
CLUSTER_NAME=YOUR-EKS-CLUSTER-NAME
REGION=YOUR-EKS-CLUSTER-REGION
SERVICE_ACCOUNT_NAMESPACE=fargate-container-insights
SERVICE_ACCOUNT_NAME=adot-collector
SERVICE_ACCOUNT_IAM_ROLE=EKS-Fargate-ADOT-ServiceAccount-Role
SERVICE_ACCOUNT_IAM_POLICY=arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy

eksctl utils associate-iam-oidc-provider \
--cluster=$CLUSTER_NAME \
--approve

eksctl create iamserviceaccount \
--cluster=$CLUSTER_NAME \
--region=$REGION \
--name=$SERVICE_ACCOUNT_NAME \
--namespace=$SERVICE_ACCOUNT_NAMESPACE \
--role-name=$SERVICE_ACCOUNT_IAM_ROLE \
--attach-policy-arn=$SERVICE_ACCOUNT_IAM_POLICY \
--approve

2.    Deploy the ADOT Collector as a Kubernetes StatefulSet using this command. Replace my-cluster-name with the name of your EKS cluster, and my-cluster-region with the name of the Region.

ClusterName=<my-cluster-name>
Region=<my-cluster-region>
curl https://raw.githubusercontent.com/aws-observability/aws-otel-collector/main/deployment-template/eks/otel-fargate-container-insights.yaml | sed 's/YOUR-EKS-CLUSTER-NAME/'${ClusterName}'/;s/us-east-1/'${Region}'/' | kubectl apply -f -

3.    Verify that the ADOT Collector pod is running:

kubectl get pods -n fargate-container-insights

4.    Optional: The aws-otel-collector image is pulled from Docker Hub as an anonymous user by default. This pull might be subject to a rate limit. To pull the aws-otel-collector docker image on Amazon ECR, patch aws-otel-eks-ci DaemonSet:

kubectl patch sts adot-collector -n fargate-container-insights -p \
'{"spec":{"template":{"spec":{"containers":[{"name":"aws-otel-collector","image":"public.ecr.aws/aws-observability/aws-otel-collector:latest"}]}}}}'

Delete ADOT

Run these commands to delete ADOT.

eksctl delete iamserviceaccount —cluster CLUSTER_NAME —name adot-collector
ClusterName=<my-cluster-name>
Region=<my-cluster-region>
curl https://raw.githubusercontent.com/aws-observability/aws-otel-collector/main/deployment-template/eks/otel-fargate-container-insights.yaml | sed 's/YOUR-EKS-CLUSTER-NAME/'${ClusterName}'/;s/us-east-1/'${Region}'/' | kubectl delete -f -

Did this article help?


Do you need billing or technical support?