Autoscaling EKS on Fargate with custom metrics

NOTICE: October 04, 2024 – This post no longer reflects the best guidance for configuring a service mesh with Amazon EKS and its examples no longer work as shown. Please refer to newer content on Amazon VPC Lattice.

——–

This is a guest post by Stefan Prodan of Weaveworks.

Autoscaling is an approach to automatically scale up or down workloads based on the resource usage. In Kubernetes, the Horizontal Pod Autoscaler (HPA) can scale pods based on observed CPU utilization and memory usage. Starting with Kubernetes 1.7, an aggregation layer was introduced that allows third-party applications to extend the Kubernetes API by registering themselves as API add-ons. Such an add-on can implement the Custom Metrics API and enable HPA access to arbitrary metrics.

What follows is a step-by-step guide on configuring HPA with metrics provided by Prometheus to automatically scale pods running on Amazon EKS on AWS Fargate.

Prerequisites

Install eksctl and fluxctl for macOS with Homebrew like so:

brew tap weaveworks/tap
brew install weaveworks/tap/eksctl
brew install fluxctl

And for Windows you can use Chocolatey:

choco install eksctl
choco install fluxctl

Last but not least, for Linux you can download the eksctl and fluxctl binaries from GitHub.

Create an EKS cluster

First, create an EKS cluster with two EC2 managed nodes and a Fargate profile using the following config file:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: eks-fargate-hpa
  region: eu-west-1

managedNodeGroups:
  - name: default
    instanceType: m5.large
    desiredCapacity: 2
    volumeSize: 120
    iam:
      withAddonPolicies:
        appMesh: true
        albIngress: true

fargateProfiles:
  - name: default
    selectors:
      - namespace: demo
        labels:
          scheduler: fargate

Save above cluster config in a file called cluster-config.yaml and then do:

eksctl create cluster -f cluster-config.yaml

You’ll use the managed nodes for cluster add-ons (CoreDNS, kube-proxy) and for the HPA metrics add-ons with the following configurations:

Prometheus: scrapes pods and stores metrics
Prometheus metrics adapter: queries Prometheus and exposes metrics for the Kubernetes custom metrics API
Metrics server: collects pods CPU and memory usage and exposes metrics for the Kubernetes resource metrics API

For the demo, you’ll be using the application podinfo running on Fargate. Note that only the pods deployed in the demo namespace with the scheduler: fargate label will be running on Fargate, as defined in the Fargate profile, above.

Create a GitHub repository

To configure HPA for Fargate you’ll be using an eksctl GitOps profile. This profile allows you to create an Kubernetes application platform tailored for a specific use case. So, let’s do this: create a GitHub repository and clone it locally. Replace GH_USERand GH_REPO with your own GitHub username and the name of the new repository you created in the previous step. Use these variables to clone your repo and setup GitOps for your cluster:

export GH_USER=YOUR_GITHUB_USERNAME
export GH_REPO=YOUR_GITHUB_REPOSITORY

git clone https://github.com/${GH_USER}/${GH_REPO}
cd ${GH_REPO}

Now, run the following eksctl command to set up the pipeline:

export EKSCTL_EXPERIMENTAL=true

eksctl enable repo \
       --cluster=eks-fargate-hpa \
       --region=eu-west-1 \
       --git-url=git@github.com:${GH_USER}/${GH_REPO} \
       --git-user=fluxcd \
       --git-email=${GH_USER}@users.noreply.github.com

The above command takes an existing EKS cluster and an empty repository and sets up a GitOps pipeline. After the command finishes installing FluxCD and the Helm Operator, you will be asked to add Flux’s deploy key to your GitHub repository so you need to copy the public key and create a deploy key with write access on your GitHub repository. For this:

Go to “Settings > Deploy keys” and click on “Add deploy key”.
Make sure to also check “Allow write access”.
Then, paste the Flux public key and click “Add key”.

Once that is done, Flux will be able to pick up changes in the repository and deploy them to the cluster.

Install the metrics add-ons

Next, to install the metrics add-ons, run the following command:

eksctl enable profile \
       --name=https://github.com/stefanprodan/eks-hpa-profile \
       --cluster=eks-fargate-hpa \
       --region=eu-west-1 \
       --git-url=git@github.com:${GH_USER}/${GH_REPO} \
       --git-user=fluxcd \
       --git-email=${GH_USER}@users.noreply.github.com

The above command adds the HPA metrics add-ons and the demo app manifests to the configured repository.

Now sync your local repository with the GitHub repo, using:

git pull origin master

And now run the following command to apply the manifests to your EKS cluster:

fluxctl sync --k8s-fwd-ns flux

Flux reconciles your GitHub repo with the EKS cluster every five minutes; the above command can be used to speed up the synchronization.`

Now it’s time to list the installed components, compare your output with the following:

$ kubectl -n monitoring-system get helmreleases

NAME                 RELEASE              STATUS
metrics-server       metrics-server       DEPLOYED
prometheus           prometheus           DEPLOYED
prometheus-adapter   prometheus-adapter   DEPLOYED

Install sample app

You’ll use a Go web sample app named podinfo to test HPA. The app is instrumented with Prometheus and exposes a counter called http_requests_total. The HPA controller, part of the Kubernetes control plane, will scale the pods running on Fargate based on the number of HTTP requests per second as represented by said Prometheus counter.

So, install podinfo by setting fluxcd.io/ignore to false in demo/namespace.yaml:

cat << EOF | tee demo/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: demo
  annotations:
    fluxcd.io/ignore: "false"
EOF

And apply changes via git like so:

git add -A && \
git commit -m "init demo" && \
git push origin master && \
fluxctl sync --k8s-fwd-ns flux

Now, wait for EKS on Fargate to schedule and launch the podinfo app using watch kubectl -n demo get po -l scheduler=fargate. When podinfo starts, Prometheus will scrape the metrics endpoint and the Prometheus adapter will export the HTTP requests per second metrics to the Kubernetes custom metrics API:

$ watch kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "custom.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "namespaces/http_requests_per_second",
      "singularName": "",
      "namespaced": false,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "pods/http_requests_per_second",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    }
  ]
}

Configure autoscaling based on HTTP traffic

To configure auto-scaling you can set up a HPA definition that uses the http_requests_per_second metric. The HPA manifest is in demo/podinfo/hpa.yaml and it’s set to scale up podinfo when the average req/sec per pod exceeds 10:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo
  namespace: demo
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Pods
      pods:
        metricName: http_requests_per_second
        targetAverageValue: 10

Note that the podinfo deployment manifest has no replicas defined in deployment.spec.replicas, since the HPA controller updates the number of replicas based on the metric average value.

Once the metric is available to the metrics API, the HPA controller will display the current value:

$ watch kubectl -n demo get hpa

NAME      REFERENCE            TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
podinfo   Deployment/podinfo   200m/10   1         10        1          8m58s

The m in 200m above represents milli-units, that is, 200m means 0.2 req/sec. The traffic is generated by Prometheus that scrapes the /metrics endpoint every five seconds.

You can exec into the loadtester pod with the following command:

kubectl -n demo exec -it loadtester-xxxx-xxxx

And generate additional traffic using hey as shown below:

hey -z 10m -c 5 -q 5 -disable-keepalive http://podinfo.demo

After a few minutes the HPA begins to scale up the deployment:

$ kubectl -n demo describe hpa podinfo

Events:
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  2m    horizontal-pod-autoscaler  New size: 3; reason: pods metric http_requests_per_second above target

When the load tests finishes, the HPA down scales the deployment to it’s initial replicas:

...
Events:
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  21s   horizontal-pod-autoscaler  New size: 1; reason: All metrics below target

You may have noticed that the autoscaler doesn’t react immediately to usage spikes. The metrics sync happens every 30 seconds and scaling up or down can only happen if there was no rescaling within the last three to five minutes. In this way, the HPA prevents rapid execution of conflicting decisions.

Configure autoscaling based on CPU usage

For workloads that aren’t instrumented with Prometheus, you can use the Kubernetes metrics server and configure auto-scaling based on CPU and/or memory usage.

Update the HPA manifest by replacing the HTTP metric with CPU average utilization:

cat << EOF | tee demo/podinfo/hpa.yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo
  namespace: demo
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Resource
        resource:
          name: cpu
          targetAverageUtilization: 90
EOF

And apply the changes, as usual, via git:

git add -A && \
git commit -m "cpu hpa" && \
git push origin master && \
fluxctl sync --k8s-fwd-ns flux

Now, run a load test to increase CPU usage above 90% to trigger the HPA (again, execing into the respective pod first):

hey -z 10m -c 5 -q 5 -m POST -d 'test' -disable-keepalive http://podinfo.demo/token

The Kubernetes Metrics Server is a cluster-wide aggregator of resource usage data, it collects CPU and memory usage for nodes and pods by pooling data from the kubernetes.summary_api. The summary API is a memory-efficient API for passing data from Kubelet to the metrics server.

Configure autoscaling based on App Mesh traffic

One of the advantages of using a service mesh like AWS App Mesh is the built-in monitoring capability. You don’t have to instrument your web apps in order to monitor the L7 traffic.

The Envoy sidecar used by App Mesh exposes a counter envoy_cluster_upstream_rq, you can configure the Prometheus adapter to transform this metric into req/sec rate with the following config for Helm:

apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: prometheus-adapter
  namespace: monitoring-system
spec:
  releaseName: prometheus-adapter
  chart:
    repository: https://kubernetes-charts.storage.googleapis.com/
    name: prometheus-adapter
    version: 1.4.0
  values:
    prometheus:
      url: http://prometheus.monitoring-system
      port: 9090
    rules:
      default: false
      custom:
        - seriesQuery: 'envoy_cluster_upstream_rq{kubernetes_namespace!="",kubernetes_pod_name!=""}'
          resources:
            overrides:
              kubernetes_namespace: {resource: "namespace"}
              kubernetes_pod_name: {resource: "pod"}
          name:
            matches: "envoy_cluster_upstream_rq"
            as: "appmesh_requests_per_second"
          metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)'

Now you can use the appmesh_requests_per_second metric in the HPA definition with the following HPA resource:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo
  namespace: demo
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Pods
      pods:
        metricName: appmesh_requests_per_second
        targetAverageValue: 10

Wrapping up

Not all systems can meet their SLAs by relying on CPU/memory usage metrics alone, most web and mobile backends require autoscaling based on requests per second to handle any traffic bursts. For ETL apps, auto scaling could be triggered by the job queue length exceeding some threshold and so on. By instrumenting your applications with Prometheus and exposing the right metrics for autoscaling you can fine tune your apps to better handle bursts and ensure high availability.

Stefan Prodan

Stefan is a Developer Experience engineer at Weaveworks and an open source contributor to cloud-native projects like Flagger, FluxCD, Helm Operator, OpenFaaS and others. He worked as a software architect and a DevOps consultant, helping companies embrace DevOps and the SRE movement. You can find him on Twitter at @stefanprodan.

Containers