Monitoring CoreDNS for DNS throttling issues using AWS Open source monitoring services

Monitoring Infrastructure and Application is essential today as it provides important information to the operations engineers to ensure the technology stack runs healthy to achieve the business outcomes. To build a microservices environment using container orchestration tool like Kubernetes, which is designed to increase flexibility and agility, there are many distributed parts that have to be monitored. CoreDNS is one such critical component of the Kubernetes ecosystem. Tracking performance issues & monitor throttling issues related to CoreDNS is really important as name resolution is one of the initial steps performed by a microservice in connecting to another microservice.

CoreDNS is a flexible, extensible Domain Name System (DNS) server that can serve as a Kubernetes cluster DNS. CoreDNS and DNS throttling issues can be challenging to identify and troubleshoot. While most of us check the CoreDNS logs and metrics in the name of monitoring, customers often forget the hard limit of 1024 packets per second (PPS) set at the ENI level. To understand how this hard limit can result in throttling issues, let’s understand a typical Kubernetes pod’s DNS resolution flow. A Kubernetes pod will have to resolve the domain names of both internal and external services for successful communication, and it utilizes the CoreDNS pod for this purpose. The CoreDNS pod then routes the DNS queries through the worker node’s (on which the CoreDNS pod is running) ENI for external endpoint resolution. In the case of internal endpoints, the packets will still have to use the worker node’s ENI, if the CoreDNS pod is not present in the same worker node as the pod making the DNS query.

Let’s now assume that there is a sudden influx of DNS queries being made and the PPS is approaching the hard limit of 1024. At this point, we will start seeing DNS throttling and we will have no clue on the root cause of the problem as the natural troubleshooting intuition is to focus on CoreDNS pod and not on the ENI metrics. This ENI level hard limit can impact all of the microservices running on that specific worker node and it is important to constantly monitor this metric to avoid outages. In this blog post we walk you through a solution that will help monitor packet drops that might occur at the ENI level to determine if there are DNS throttling issues.

An easy way to identify DNS throttling issues in worker nodes is by capturing some specific network performance metrics. The Elastic Network Adapter (ENA) driver publishes network performance metrics from the instances where they are enabled. You can troubleshoot DNS throttling using the linklocal_allowance_exceeded metric. The linklocal_allowance_exceeded is number of packets dropped because the PPS of the traffic to local proxy services exceeded the maximum for the network interface. In this blog post, we will capture the linklocal_allowance_exceeded metrics using AWS Distro for OpenTelemetry Collector. The metrics will then be stored in a Amazon Managed Service for Prometheus workspace and visualized using Amazon Managed Grafana.

Solution Overview

The following diagram in Figure1 shows the environment setup details that we will walk through in this blog post:

Figure 1. Architecture diagram: CoreDNS Monitoring.

Prerequisites

You will need the following to complete the steps in this post:

AWS Command Line Interface (AWS CLI) version 2 is a utility for controlling AWS resources
Amazon EKS Cluster
eksctl is a utility for managing Amazon EKS clusters
kubectl is a utility for managing Kubernetes
helm is a tool for automating Kubernetes deployments
ethtool – Ensure the worker nodes have ethtool installed
An existing Amazon Managed Grafana Workspace
jq

Step 1: Create an Amazon Managed Service for Prometheus workspace

In this step we will create a workspace for Amazon Managed Service for Prometheus.

You start by setting a few environment variables:

export AWS_REGION=<Your AWS RESGION>
export EKS_CLUSTER_NAME=<Your EKS Cluster Name>
export SERVICE=prometheusservice
export ACK_SYSTEM_NAMESPACE=ack-system
export RELEASE_VERSION=`curl -sL https://api.github.com/repos/aws-controllers-k8s/$SERVICE-controller/releases/latest | grep '"tag_name":' | cut -d'"' -f4`

Use AWS CLI to create the workspace using the following command:

aws amp create-workspace --alias blog-workspace --region
$AWS_REGION

The Amazon Managed Service for Prometheus workspace shown in Figure 2 should be created in just a few seconds. Once created, you will be able to see the workspace as shown below:

Figure 2. Amazon Managed Prometheus workspace.

Step 2: Deploying Prometheus ethtool exporter

ethtool is a networking utility on Linux. It is used to configure Ethernet devices on Linux. ethtool can also be used to find a lot of information about connected Ethernet devices on your worker nodes. We will be using the ethtool’s output to detect if there are any packet loss. We will use a prometheus ethtool exporter utility to store the output of ethtool command in prometheus format.

In this step, you will deploy the prometheus ethtool exporter: In the code snippet below, notice the pod spec contains the annotation “prometheus.io/scrape: ‘true’” which will be discovered by the ADOT collector for scraping the metrics exposed by ethtool.

cat << EOF > ethtool-exporter.yaml
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: ethtool-exporter
  labels:
    app: ethtool-exporter
spec:
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 100%
  selector:
    matchLabels:
      app: ethtool-exporter
  template:
    metadata:
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '9417'      
      labels:
        app: ethtool-exporter
    spec:
      hostNetwork: true
      terminationGracePeriodSeconds: 0
      containers:
      - name: ethtool-exporter
        env:
        - name: IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP      
        image: drdivano/ethtool-exporter@sha256:39e0916b16de07f62c2becb917c94cbb3a6e124a577e1325505e4d0cdd550d7b
        command:
          - "sh"
          - "-exc"
          - "python3 /ethtool-exporter.py -l \$(IP):9417 -I '(eth|em|eno|ens|enp)[0-9s]+'"
        ports:
        - containerPort: 9417
          hostPort: 9417
          name: http
          
        resources:
          limits:
            cpu: 250m
            memory: 100Mi
          requests:
            cpu: 10m
            memory: 50Mi

      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: ethtool-exporter
  name: ethtool-exporter
spec:
  clusterIP: None
  ports:
    - name: http
      port: 9417
  selector:
    app: ethtool-exporter
EOF
kubectl apply -f ethtool-exporter.yaml

Step 3: Deploying AWS Distro for Open Telemetry (ADOT) Collector to scrape the ENI metrics

In this step we will deploy the ADOT collector and configure the ADOT collector to ingest metrics to Amazon Managed Service for Prometheus. We will be using the Amazon EKS add-on for ADOT operator to send the metrics “linklocal_allowance_exceeded” to Amazon Managed Service for Prometheus for monitoring CoreDNS.

Before installing the AWS Distro for OpenTelemetry (ADOT) add-on, you must meet the following prerequisites and considerations.

Meet the TLS certificate requirement to ensure end-to-end encryption.
If installing an add-on version that is v0.62.1 or earlier, grant permissions to Amazon EKS add-ons to install the ADOT operator.

kubectl apply -f https://amazon- eks.s3.amazonaws.com/docs/addons-otel-permissions.yaml

Installing cert-manager

Install cert-manager using the following command. This creates the necessary cert-manager objects that allow end-to-end encryption. This must be done for each cluster that will have the ADOT collector installed.

kubectl apply -f https://github.com/cert-manager/cert- manager/releases/download/v1.8.2/cert-manager.yaml

Verify that cert-manager is ready using the following command.

kubectl get pod -w -n cert-manager

The example output is as follows:

cert-manager-webhook-7b4c5f579b-rtp25 1/1 Running 0 10d
NAME READY STATUS RESTARTS AGE
cert-manager-6dd9658548-fds5l 1/1 Running 0 10d
cert-manager-cainjector-5987875fc7-6t5lw 1/1 Running 0 10d

Create an IAM role and Amazon EKS Service Account

We will be deploying the ADOT collector to run under the identity of a Kubernetes service account “adot-collector”. IAM roles for service accounts (IRSA) lets you associate the AmazonPrometheusRemoteWriteAccess role with a Kubernetes service account, thereby providing IAM permissions to any pod utilizing the service account to ingest the metrics to Amazon Managed Service for Prometheus.

You need kubectl and eksctl CLI tools to run the script. They must be configured to access your Amazon EKS cluster.

eksctl create iamserviceaccount \
--name adot-collector \
--namespace default \
--region $AWS_REGION \
--cluster $EKS_CLUSTER_NAME \
--attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess \
--approve \
--override-existing-serviceaccounts

Install ADOT add-on

You can check the list of add-ons enabled for different versions of Amazon EKS using the following command:

Determine the ADOT versions are available that are supported by your cluster’s version.

aws eks describe-addon-versions --addon-name adot --kubernetes-version 1.24 \
  --query "addons[].addonVersions[].[addonVersion, compatibilities[].defaultVersion]" --output text

Run following command to install the ADOT add-on, replace the –addon-version flag based on your Amazon EKS cluster version as show in step above.

aws eks create-addon --addon-name adot --addon-version v0.66.0-
eksbuild.1 --cluster-name $EKS_CLUSTER_NAME

Verify that ADOT add-on is ready using the following command.

kubectl get po -n opentelemetry-operator-system

NAME READY STATUS RESTARTS AGE
opentelemetry-operator-controller-manager-5b89b7df46-4z96l 2/2 Running 0 10d

Configure the ADOT Collector

To configure ADOT collector, let’s create collector-config-amp.yaml file. The collector-config-amp.yaml contains a config map “my-collector-amp-collector” which defines the scrape configuration to collect metrics. The config scrapes metrics from pods that has the annotation “prometheus.io/scrape: ‘true’”.

Note: We will be scraping this metric every 15s to constantly monitor and send alert notifications if there are packet drops. There will be cost implications of setting the scrape_interval low and we encourage you to set it accordingly if you are concerned about cost.

export AMP_REMOTE_WRITE_ENDPOINT=<AMP_REMOTE_WRITE_ENDPOINT>
cat > collector-config-amp.yaml <<EOF
---
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: my-collector-amp
spec:
  mode: deployment
  serviceAccount: adot-collector
  podAnnotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '8888'
  resources:
    requests:
      cpu: "1"
    limits:
      cpu: "1"
  config: |
    extensions:
      sigv4auth:
        region: $AWS_REGION
        service: "aps"

    receivers:
      #
      # Scrape configuration for the Prometheus Receiver
      # This is the same configuration used when Prometheus is installed using the community Helm chart
      # 
      prometheus:
        config:
          global:
            scrape_interval: 60s
            scrape_timeout: 30s
            external_labels:
              cluster: $EKS_CLUSTER_NAME

          scrape_configs:
          - job_name: kubernetes-pods
            scrape_interval: 15s
            scrape_timeout: 5s
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scrape
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: \$\$1:\$\$2
              source_labels:
              - __address__
              - __meta_kubernetes_pod_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
              replacement: __param_\$\$1
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: kubernetes_pod_name
            - action: drop
              regex: Pending|Succeeded|Failed|Completed
              source_labels:
              - __meta_kubernetes_pod_phase
                                
    processors:
      batch/metrics:
        timeout: 60s         

    exporters:
      prometheusremotewrite:
        endpoint: $AMP_REMOTE_WRITE_ENDPOINT
        auth:
          authenticator: sigv4auth

    service:
      extensions: [sigv4auth]
      pipelines:   
        metrics:
          receivers: [prometheus]
          processors: [batch/metrics]
          exporters: [prometheusremotewrite]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-prometheus-role
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
      - nodes/proxy
      - services
      - endpoints
      - pods
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - extensions
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
  - nonResourceURLs:
      - /metrics
    verbs:
      - get

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-prometheus-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: otel-prometheus-role
subjects:
  - kind: ServiceAccount
    name: adot-collector
    namespace: default

EOF

Apply the YAML file to your cluster to deploy the ADOT Collector

kubectl apply -f collector-config-amp.yaml

Step 4: Visualize ethtool metrics in Amazon Managed Grafana

As you already configured Amazon Managed Grafana workspace as a part of prerequisites and let us now visualize the linklocal_allowance_exceeded within the Amazon Managed Grafana and build a dashboard. Configure the Amazon Managed Service for Prometheus workspace created in step1 as a datasource inside the Amazon Managed Grafana console.

Let’s explore the metrics in Amazon Managed Grafana now: Click the explore button, and search for ethtool:

Figure 3. Exploring the metric in Amazon Managed Grafana.

Let’s build a dashboard for the linklocal_allowance_exceeded metric by using the query

rate(node_net_ethtool{device="eth0",type="linklocal_allo
wance_exceeded"} [30s])

Figure 4. Creating the panel for “linklocal_allowance_exceeded” metric in Amazon Managed Grafana.

We can see that there were no packets dropped as the value is zero. You can further extend this by configuring alerts in alert manager in Amazon Managed Service for Prometheus to send notifications

Step 5: Configure alert manager in Amazon Managed Service for Prometheus to send notifications.

Let’s configure recording rules and alerting rules to check for the metric captured “linklocal_allowance_exceeded”. Configuring the alert manager rules in Amazon Managed Service for Prometheus will ensure notifications being sent to the appropriate teams in a timely manner when a packet drop happens.

We will use the ACK Controller for Amazon Managed Service for Prometheus to provision the alerting and recording rules.

Let’s use Helm to install an ACK service controller on your cluster. Set the SERVICE and AWS_REGION environment variables to make sure that the Prometheus controller is installing in the current region.

aws ecr-public get-login-password --region us-east-1 | helm registry login --username AWS --password-stdin public.ecr.aws
helm install --create-namespace -n $ACK_SYSTEM_NAMESPACE ack-$SERVICE-controller \
oci://public.ecr.aws/aws-controllers-k8s/$SERVICE-chart --version=$RELEASE_VERSION --set=aws.region=$AWS_REGION

You can also verify the installation by running the following command:

helm list --namespace $ACK_SYSTEM_NAMESPACE -o yaml

This returns output that shows as below

- app_version: v0.1.2
 chart: prometheusservice-chart-v0.1.2
 name: ack-prometheusservice-controller
 namespace: ack-system
 revision: "1"
 status: deployed
 updated: 2022-12-20 21:31:06.656433 -0500 EST

Note : The role that you will use to provision Amazon Managed Service for Prometheus resources via ACK must have the arn:aws:iam::aws:policy/AmazonPrometheusConsoleFullAccess role.

Let’s now create a yaml file for provisioning the alert manager definition and rule groups. Save the below file as rulegroup.yaml. See RuleGroupsNamespaceData structure for the format of this file.

In the below spec file, you can see that we are creating an alert LinkLocalAllowanceExceeded for the metric collected using the PromQL expression node_net_ethtool. We are not specifying the evaluation period (for expression) as packet drops do not happen frequently and you would want to get notified the moment it happens to avoid CoreDNS performance issues.

apiVersion: prometheusservice.services.k8s.aws/v1alpha1
kind: RuleGroupsNamespace
metadata:
   name: default-rule
spec:
   workspaceID: WORKSPACE-ID
   name: default-rule
   configuration: |
     groups:
     - name: example
       rules:
       - record: metric:linklocal_allowance_exceeded
         expr: rate(node_net_ethtool{device="eth0",type="linklocal_allowance_exceeded"}[30s]) 
       - alert: LinkLocalAllowanceExceeded
         expr: rate(node_net_ethtool{device="eth0",type="linklocal_allowance_exceeded"} [30s]) > 0
         labels:
           severity: critical

         annotations:
           summary: Packets dropped due to PPS rate allowance exceeded for local services  (instance {{ $labels.instance }}) on cluster {{ $labels.cluster }}
           description: "LinkLocalAllowanceExceeded is greater than 0"

Replace WORKSPACE-ID with the Amazon Managed Service for Prometheus workspace Id that was created as part of step 1 in the below provided yaml file.

Let’s now configure the alert manager definition. Save the below file as alertmanager.yaml.

See AlertManagerDefinitionData structure for the format of this file.

apiVersion: prometheusservice.services.k8s.aws/v1alpha1
kind: AlertManagerDefinition
metadata:
  name: alert-manager
spec:
  workspaceID: WORKSPACE-ID
  configuration: |
    alertmanager_config: |
      route:
         receiver: default_receiver
      receivers:
        - name: default_receiver
          sns_configs:
          - topic_arn: TOPIC-ARN
            sigv4:
              region: REGION
            message: |
              alert_type: {{ .CommonLabels.alertname }}
              event_type: {{ .CommonLabels.event_type }}

Replace WORKSPACE-ID with the Amazon Managed Service for Prometheus workspace Id that was created as part of step 1. Replace TOPIC-ARN with the ARN of an Amazon Simple Notification Service (Amazon SNS) topic where you want to send the alerts, and REGION with the current region of the workload. Make sure that your workspace has permissions to send messages to Amazon SNS.

Apply these changes by issuing the following commands:

kubectl apply -f rulegroup.yaml -n $ACK_SYSTEM_NAMESPACE
kubectl apply -f alertmanager.yaml -n $ACK_SYSTEM_NAMESPACE

It may take a few seconds for the recording rules and alert manager to be created in the Amazon Managed Service for Prometheus workspace.

Clean up

To delete the resources provisioned in the blog, please execute the following commands. Make sure the WORKSPACE variable has the workspace id of the “blog-workspace”

export WORKSPACE=$(aws amp list-workspaces | jq -r '.workspaces[] | select(.alias=="blog-workspace").workspaceId')
echo $WORKSPACE
aws amp delete-workspace --workspace-id $WORKSPACE
kubectl delete -f ethtool-exporter.yaml
kubectl delete -f collector-config-amp.yaml
kubectl delete -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.2/cert-manager.yaml

Conclusion

In this post, we demonstrated how to monitor and create alerts for CoreDNS throttling issue utilizing AWS Distro for OpenTelemetry (ADOT), Amazon Managed Service for Prometheus, and visualize metrics utilizing Amazon Managed Service for Grafana. By monitoring the “linklocal_allowance_exceeded” metric, customers will be able to pro-actively detect the packet drops and take preventive actions. Customers can capture additional metrics for CoreDNS following similar steps provided in the post to monitor the health of CoreDNS.

AWS Cloud Operations Blog