AWS Cloud Operations & Migrations Blog

Proactive autoscaling of Kubernetes workloads with KEDA and Amazon CloudWatch

Container Orchestration platforms, such as Amazon Elastic Kubernetes Service (Amazon EKS), have simplified the process of building, securing, operating, and maintaining container-based applications. Therefore, they have helped organizations focus on building applications. Customers have started adopting event-driven deployment, allowing Kubernetes deployments to scale automatically in response to metrics from various sources dynamically.

By implementing event-driven deployment and autoscaling, customers can achieve cost savings by providing on-demand compute and autoscale efficiently that are based on custom needs. KEDA (Kubernetes-based Event Driven Autoscaler) lets you drive the autoscaling of Kubernetes workloads based on the number of events, such as a custom metric scraped breaching a specified threshold, or when there’s a message in a Amazon Managed Streaming for Apache Kafka queue.

Amazon CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), IT managers, and product owners. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events. You get a unified view of operational health, and you gain complete visibility of your AWS resources, applications, and services running on AWS and on-premises.

This post will show you how to use KEDA to autoscale Amazon EKS pods by querying the metrics stored in CloudWatch.

Solution Overview

The following diagram shows the complete setup that we will walk through in this post.


You will need the following to complete the steps in this post:

Create an Amazon EKS Cluster

You start by setting a few environment variables:

export CW_ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)
export CW_AWS_REGION=us-west-2 
export CW_HO11Y_ECR=$CW_ACCOUNT_ID.dkr.ecr.$
export CW_HO11Y_IMAGE=$CW_ACCOUNT_ID.dkr.ecr.$

Next, you prepare the required Kubernetes scripts with a shell script from this GitHub repository and create an Amazon EKS cluster using eksctl:

git clone
cd ./containers-blog-maelstrom/scaling-with-KEDA
mkdir build
#Run a Shell script to prepare kubernetes scripts for this demo.
chmod +x

eksctl create cluster -f ./build/eks-cluster-config.yaml

Creating a cluster can take up to 10 minutes. When the creation completes, proceed to the next steps.

Deploying a KEDA Operator

Next, you install the keda operator in the keda namespace of our Amazon EKS cluster by using the following commands:

helm repo add kedacore
# Helm install for Keda Operator 
helm install keda kedacore/keda \
  --namespace keda \
  -f ./keda-values.yaml

Now you can check on the keda operator pods:

$ kubectl get pods -n keda

NAME                                              READY   STATUS    RESTARTS   AGE
keda-operator-68b7cbdc78-g9lqv                    1/1     Running   0          32s
keda-operator-metrics-apiserver-5d95f8799-dl8kp   1/1     Running   0          32s

Deploy sample application

You will use a sample application called ho11y, a synthetic signal generator that lets you test observability solutions for microservices. It emits logs, metrics, and traces in a configurable manner. For more information, see the AWS O11y Receipes respository.

#Run a Shell script to build and push a docker image for ho11y app to Amazon ECR.
chmod +x
kubectl apply -f ./build/ho11y-app.yaml

This command will create the Kubernetes deployments and services as shown in the following:

$ kubectl get all -n ho11y

NAME                               READY   STATUS    RESTARTS   AGE
pod/downstream0-c6859bf6d-sk656    1/1     Running   0          3m14s
pod/downstream1-56f74998d5-7w2hw   1/1     Running   0          3m14s
pod/frontend-8796bd84b-7kndl       1/1     Running   0          3m15s

NAME                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/downstream0   ClusterIP   <none>        80/TCP    3m14s
service/downstream1   ClusterIP    <none>        80/TCP    3m13s
service/frontend      ClusterIP    <none>        80/TCP    3m14s

NAME                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/downstream0   1/1     1            1           3m16s
deployment.apps/downstream1   1/1     1            1           3m16s
deployment.apps/frontend      1/1     1            1           3m17s

NAME                                     DESIRED   CURRENT   READY   AGE
replicaset.apps/downstream0-c6859bf6d    1         1         1       3m16s
replicaset.apps/downstream1-56f74998d5   1         1         1       3m16s
replicaset.apps/frontend-8796bd84b       1         1         1       3m17skubectl get all -n ho11y 

Scrape metrics using AWS Distro for OpenTelemetry (ADOT)

Next, you will deploy an OpenTelemetry (ADOT) collector to scrape Amazon Managed Service for Prometheus metrics emitted from the ho11y application

eksctl create iamserviceaccount \
  --name adot-collector \
  --namespace ho11y \
  --cluster $CW_KEDA_CLUSTER \
  --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy \
kubectl apply -f ./build/cw-eks-adot-prometheus-deployment.yaml

After the ADOT collector is deployed, it will collect the metrics and ingest them into the specified CloudWatch namespace. The scrape configuration is similar to that of a Prometheus server. We have added the necessary configuration for scraping metrics from the ho11y application.

Navigate to your CloudWatch console and look at the holly_total metric. The deep link opens in the Oregon (us-west-2) Region. You can specify a different Region in the top-right console corner.

Configure sigv4 authentication for querying custom metrics from CloudWatch 

AWS Signature Version 4 is a process to add authentication information to requests made to AWS APIs using HTTP. The AWS Command Line Interface (AWS CLI) and AWS SDKs use this protocol to make calls to the AWS APIs. CloudWatch API calls require sigv4 authentication. Furthermore, since KEDA doesn’t support sigv4, we’ll deploy a sigv4 proxy as a K8s Service to act as a gateway for KEDA to access the query CloudWatch API endpoints.

Execute the following commands to deploy the sigv4 proxy:

kubectl apply -f ./build/keda-sigv4.yaml

Setup autoscaling using KEDA scaled object

Next, you will create the ScaledObject that will scale the deployment by querying the metrics stored in CloudWatch. A ScaledObject represents the desired mapping between an event source, such as a Prometheus metric and the Kubernetes Deployment, StatefulSet, or any Custom Resource that defines /scale sub-resource.

Behind the scenes, KEDA monitors for the event source, and then feeds that data to Kubernetes and the HPA (Horizontal Pod Autoscaler) to drive the scaling of the specified Kubernetes resource. Each replica of a resource is actively pulling items from the event source.

The following commands will deploy the ScaledObject named ho11y-hpa that will query the CloudWatch endpoint for a custom metric called ho11y_total. The ho11y_total metric represents the number of application invocations, and the threshold is specified as one. Depending on the value over a period of one minute, the scale in/out of downstream0 deployment happens between 1 and 10 pods.

kind: ScaledObject
  name: ho11y-hpa
  namespace: ho11y
    apiVersion: apps/v1
    kind: Deployment
    name: downstream0       # Name of deployment you want to autoscale; Must be in same namespace as scaled object
  pollingInterval:  30      # Optional. Default: 30; the interval to check each trigger on
  cooldownPeriod:   300     # Optional. Default: 300; the period to wait after the last trigger reported active before scaling the deployment back to minReplicaCount
    failureThreshold: 3
    replicas: 2
  minReplicaCount: 1        # Optional. Default: 0; minimum number of replicas that KEDA will scale the deployment down to
  maxReplicaCount: 10       # Optional. Default: 100; the maximum number of replicas that KEDA will scale the deployment out to
  triggers:                 # Trigger activate the deployment
    - type: aws-cloudwatch
        namespace: AWSObservability/Keda/PrometheusMetrics
        dimensionName: k8s_namespace;
        dimensionValue: ho11y;ho11y
        metricName: ho11y_total
        targetMetricValue: '1'
        minMetricValue: '0'
        awsRegion: "{{CW_AWS_REGION}}"
        identityOwner: operator

KEDA also supports the scaling behavior that we configure in the Horizontal Pod Autoscaler. To make your scaling even more powerful, you can configure the pollingInterval and cooldownPeriod configurations. Follow this link to get more details on the CloudWatch trigger and the scaled object. Moreover, KEDA supports various additional scalers, and a current list of scalers is available on the KEDA home page.

Once we deploy the scaledobject, the KEDA will also create an HPA object in the ho11y namespace with the configuration specified in the scaledobject.yaml:

kubectl apply -f ./build/scaledobject.yaml
kubectl get hpa -n ho11y

NAME                 REFERENCE                TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-ho11y-hpa   Deployment/downstream0   0/1 (avg)   1         10        1          2d10h

Then take a quick look at our deployment/pod for ho11y:

$ kubectl get deploy downstream0 -n ho11y
downstream0   1/1     1            1           8d

Loading the ho11y application

You need to place some load on the application by running the following commands:

frontend_pod=`kubectl get pod -n ho11y --no-headers -l app=frontend -o jsonpath='{.items[*]}'`
while [ $loop_counter -le 5000 ] ;
	kubectl exec -n ho11y -it $frontend_pod -- curl downstream0.ho11y.svc.cluster.local;
	echo ;

Next, you will investigate the deployment to see if our deployment downstream0 is scaling in to spin more pods in response to the load on the application. Increased load on the application will cause the ho11y_total custom metric in CloudWatch to go to one or higher, and it will trigger the deployment/pod scaling.

Note that it can take a few minutes before observing the deployment scale-in.

$ kubectl get deploy downstream0 -n ho11y -w
downstream0   1/1     1            1           8d
downstream0   1/4     1            1           8d
downstream0   1/4     1            1           8d
downstream0   1/4     1            1           8d
downstream0   1/4     4            1           8d
downstream0   2/4     4            2           8d
downstream0   3/4     4            3           8d
downstream0   4/4     4            4           8d
downstream0   4/7     4            4           8d
downstream0   4/7     4            4           8d
downstream0   4/7     4            4           8d
downstream0   4/7     7            4           8d
downstream0   5/7     7            5           8d
downstream0   6/7     7            6           8d
downstream0   7/7     7            7           8d
downstream0   6/7     7            6           8d
downstream0   7/7     7            7           8d 

Describe the HPA using the following command, and you should see SuccessfulRescale happening from horizontal-pod-autoscaler

kubectl describe hpa -n ho11y.

This concludes the usage of KEDA to successfully autoscale the application using the metrics ingested into CloudWatch.


You will continue to incur cost until deleting the infrastructure that you created for this post. Delete the cluster resources using the following commands:

kubectl delete -f ./build/scaledobject.yaml
kubectl delete -f ./build/keda-sigv4.yaml
kubectl delete -f ./build/cw-eks-adot-prometheus-daemonset.yaml
kubectl delete -f ho11y-app1.yaml
helm delete keda kedacore/keda --namespace keda
eksctl delete cluster $CW_KEDA_CLUSTER


This post demonstrates the detailed steps for utilizing the KEDA operator to autoscale deployments based on custom metrics emitted by the instrumented application that CloudWatch pushes. This capability helps customers scale compute capacity on-demand by provisioning the pods only when needed to serve bursts of traffic. Furthermore, CloudWatch lets you store the metrics reliably. KEDA can also monitor and efficiently scale the workloads out/in based on the events occurring.

Also checkout Proactive autoscaling of Kubernetes workloads with KEDA post if you are curious to learn about autoscaling your kubernetes workloads using metrics ingested into Amazon Managed Service for Prometheus.


Elamaran Shanmugam

Elamaran (Ela) Shanmugam is a Sr. Cloud Architect with Amazon Web Services Professional Services. Ela is a Container, Observability and Multi-Account Architecture SME and helps AWS customers to design and build scalable, secure and optimized container workloads on AWS. His passion is building and automating Infrastructure to allow customers to focus more on their business. He is based out of Tampa, Florida and you can reach him on twitter.

Munish Dabra

Munish Dabra is a Sr. Solutions Architect at Amazon Web Services. He is a software technology leader with ~20 years of experience in building scalable and distributed software systems. His current area of interests are containers, observability and AI/ML. He has an educational background in Computer Engineering, and M.B.A from The University of Texas. He is based out of Houston and in his spare time, he loves to play with his two kids and follows Tennis and Cricket.

Vikram Venkataramanan

Vikram Venkataraman is a Senior Technical Account Manager at Amazon Web Services and also a container enthusiast. He helps organization with best practices for running workloads on AWS. In his spare time, he loves to play with his two kids and follows Cricket.