Autoscaling Kubernetes workloads with KEDA using Amazon Managed Service for Prometheus metrics

Introduction

With the rising popularity of applications hosted on Amazon Elastic Kubernetes Service (Amazon EKS), a key challenge is handling increases in traffic and load efficiently. Traditionally, you would have to manually scale out your applications by adding more instances – an approach that’s time-consuming, inefficient, and prone to over or under provisioning. A better solution is to leverage autoscaling to automatically scale application instances based on real-time demand signals. Kubernetes Event-driven Autoscaling (KEDA) is an autoscaling tool for Kubernetes. KEDA allows scaling workloads based on metrics, events, and custom launch from sources like queues, databases, or monitoring systems. This enables precise matching of allocated resources to application load needs.

Amazon Managed Service for Prometheus provides a Prometheus-compatible metric monitoring solution for Amazon EKS clusters where key metrics are stored securely. You can leverage Amazon Managed Service for Prometheus and expose select application and business metrics to drive KEDA autoscaling decisions. In this blog post, we will demonstrate an automated scaling solution with KEDA and Amazon Managed Service for Prometheus on Amazon EKS. Specifically, we will configure KEDA to scale out a sample application deployment based on Requests Per Second (RPS) metrics from Amazon Managed Service for Prometheus. This delivers automated scaling perfectly sized to handle production workload demands. By following this guide, you can add flexible autoscaling to your own Amazon EKS workloads by leveraging KEDA’s integration with Amazon Managed Service for Prometheus. Amazon Managed Grafana is used here along with Amazon Managed Service for Prometheus to allow monitoring and analytics. Users can visualize the application metrics to gain insights into the auto-scaling patterns and correlate them with business events.

Prerequisites

You will need the following resources and tools for this walkthrough:

Solution Overview

This solution demonstrate several AWS integration with open source software to create an automated scaling pipeline. An Amazon EKS cluster provides the managed Kubernetes environment for deployment and orchestration. The AWS Distro for Open Telemetry (ADOT) is used to scrape the metrics from Amazon Managed Service for Prometheus and collect custom application metrics. To enable event-driven autoscaling, we install KEDA on Amazon EKS cluster. KEDA allows you to create scaling rules based on the metrics streams from Prometheus. Finally, Amazon Managed Grafana provides pre-built dashboards to visualize the scaling metrics from Amazon Managed Service for Prometheus and confirm that KEDA is properly autoscaling the microservice as load increases.

Bringing these components together, as shown in the following architecture diagram:

KEDA components are deployed on Amazon EKS cluster to configure the platform for event-driven, metrics-based scaling.
ADOT is configured to scrape the metrics from the Amazon Managed Service for Prometheus from our sample app like request rates and latency.
KEDA ScaledObject configuration defines the autoscaling rules based on Prometheus metrics streams for the microservice deployment.
Generate load traffic against our app. As traffic increases, we use the Amazon Managed Grafana dashboards.
KEDA leverages the Prometheus metrics to automatically scale out the pods to match demand.
Amazon Managed Grafana can be used to monitor the various metrics from Amazon Managed Service for Prometheus.

Figure 1. Architecture diagram

The flow of auto-scaling as in the following diagram, the microservice deployment works as follows. The end user sends requests to the microservice application running on Kubernetes. This application workload can fluctuate over time. The Amazon Distro for OpenTelemetry (ADOT) agent is configured as a sidecar on each microservice pod. ADOT collects metrics like request rate, latency, and error rate, and sends them to Amazon’s Managed Service for Prometheus, a managed time-series database optimized for large-scale, high-cardinality metrics. It offers high availability and elastic capabilities. KEDA queries the custom application metrics from Prometheus on a regular interval, which is the fixed, consistent time period between each query that KEDA makes to Prometheus to retrieve the custom application metrics for autoscaling purposes, defined in the configuration.

Based on the metrics data, KEDA determines if the pods need to be scaled up or down and interacts with the Horizontal Pod Autoscaler (HPA) to invoke auto-scaling actions. The HPA controller gets to that the desired number of pod replicas specified by KEDA are spun up or terminated gracefully to match the workload configurations. This provides automated scale up/down capabilities. In summary, ADOT, Amazon’s Managed Prometheus service, KEDA, and HPA work together to enable metrics-driven autoscaling with monitoring for Kubernetes microservices.

Figure 2. Sequence Diagram

By combining managed services like Amazon EKS, Amazon Managed Service for Prometheus, and Amazon Managed Grafana with open source components like KEDA and HPA, we’ve achieved automated application scaling governed by real time metrics. This provides a flexible and cloud native architecture that can scale production applications based on utilization indicators. The same model can drive scaling using metrics like CPU, memory, or application-specific metrics through Prometheus and KEDA’s integration.

Step 1: Setup the environment variables and artifacts

export EKS_CLUSTER=KEDA-AMP
export AWS_REGION=us-east-1
export ACCOUNT_ID=`aws sts get-caller-identity |jq -r ".Account"`

Get the required artifacts from this repository as following:

git clone https://github.com/aws-samples/keda-amp-scaling.git
cd keda-amp-sampling

Step 2: Create an Amazon EKS Cluster

An Amazon EKS cluster can be created using the eksctl command line tool which provides a way to get started for basic cluster creation with sensible defaults as following.

eksctl create cluster -f eks-cluster-config.yaml

or, the AWS Management Console provides a graphical interface to guide you through the process of creating EKS clusters.

Infrastructure as Code (IaC) tools like AWS CloudFormation or Terraform allow automating and managing Kubernetes clusters and all associated resources through code. This enables version control, reuse, and integration with CI/CD pipelines for production grade deployments. IaC best practices are recommended for manageability and repeatability across environments and accounts.

Step 3: Deploy KEDA

The ‘helm upgrade —install’ command is used to install the Kubernetes Event-driven Autoscaling (KEDA) component on the Kubernetes cluster.

helm repo add kedacore https://kedacore.github.io/charts
helm upgrade --install keda kedacore/keda --version 2.13.1 -n keda --create-namespace

The installation uses the official KEDA Helm chart (version 2.13.1) from the kedacore repo, installed with release name ‘keda’ in namespace ‘keda’ (created automatically if needed). The Helm chart contains manifests to deploy KEDA’s custom resource definitions, RBAC components, Prometheus metrics adapter, and the KEDA controller pod. It runs with defaults suitable for most use cases. Configuration can be customized as needed. Using Helm manages the KEDA deployment lifecycle including deployment, versioning, upgrades, and rollbacks.

Step 4: Create Amazon Managed Service for Prometheus Workspace

The ‘aws amp create-workspace’ command creates an Amazon Managed Service for Prometheus workspace with the alias ‘AMP-KEDA’ in the specified AWS region. The workspaces provide isolated environments for storing Prometheus metrics and dashboards. The workspace is created with default settings which can be further customized if needed. The call returns the ID of the newly created workspace like ws-1f649a35-5f44-4744-885e-95a4844cba68. This ID is required for sending metrics data to the workspace from applications as well as for allowing other services to access the data.

aws amp create-workspace --alias AMP-KEDA --region $AWS_REGION
AMP_WS_ID=`aws amp list-workspaces | jq -r ".workspaces[0].workspaceId"`

Step 5: Create Amazon Managed Grafana workspace

When creating an Amazon Managed Grafana workspace using the AWS CLI, you must first create an IAM Role. This role will be assigned to the workspace and the IAM permissions will be used to access any AWS data sources.

aws iam create-role --role-name grafana-role \<br />--assume-role-policy-document file://grafana_trust_policy.json

Next, create the Amazon Managed Grafana workspace configured to use AWS Single Sign-On (AWS SSO) for authentication and the IAM role we created preceding. The commands following will help create workspace

RESULT=$(aws grafana create-workspace \
    --account-access-type="CURRENT_ACCOUNT" \
    --authentication-providers "SSO" \
    --permission-type "CUSTOMER_MANAGED" \
    --workspace-name "AMP-KEDA" \
    --workspace-role-arn "grafana-role")
    
export AMG_WS_ID=$(jq -r .workspace.id <<< $RESULT)

Step 6: Synthetic testing of KEDA scaling with Amazon Managed Service for Prometheus

The setup is now complete, we will perform synthetic testing. For this, we’re deploying a Kubernetes deployment of the nginx image. This will run a single pod with the nginx image:

kubectl create deploy nginx-scaledobj --image=nginxinc/nginx-unprivileged

After this, we deploy the KEDA scaled object. As seen in the configuration snippet, the scaled object queries a custom metric vector (100) which returns a constant value of 100. The threshold is set to 25 which will scale the deployment to query result/threshold = 4 pods

triggers:
  - type: prometheus
     metadata:
        serverAddress: https://aps-workspaces.$AWS_REGION.amazonaws.com/workspaces/$AMP_WS_ID
        metricName: s0-prometheus
        threshold: "25"
        query: vector(100)

When checking the HPA, it shows the target metric is 25/25 (avg) and has scaled to 4 replicas

kubectl get hpa 
NAME                       REFERENCE                    TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-nginx-scaledobj   Deployment/nginx-scaledobj   25/25 (avg)    1         30        4          4m18s

This scaling event is also evident from the KEDA controller logs:

2024-02-14T01:46:25Z    INFO    Initializing Scaling logic according to ScaledObject Specification    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"nginx-scaledobj","namespace":"default"}, "namespace": "default", "name": "nginx-scaledobj", "reconcileID": "04b8918a-8c7e-4212-aa15-b339bc6ce667"}
2024-02-14T01:46:25Z    INFO    Reconciling ScaledObject    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"nginx-scaledobj","namespace":"default"}, "namespace": "default", "name": "nginx-scaledobj", "reconcileID": "70e17ffb-8d85-43b3-8df0-70e502b28719"}
2024-02-14T01:46:25Z    INFO    Detected resource targeted for scaling    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"nginx-scaledobj","namespace":"default"}, "namespace": "default", "name": "nginx-scaledobj", "reconcileID": "70e17ffb-8d85-43b3-8df0-70e502b28719", "resource": "apps/v1.Deployment", "name": "nginx-scaledobj"}

Finally, checking the pod status confirms there are now 4 nginx pods running:

kubectl get pods
nginx-scaledobj-dd6c4d5f6-hzkt8   1/1     Running   0               10s
nginx-scaledobj-dd6c4d5f6-k5z7h   1/1     Running   0               10s
nginx-scaledobj-dd6c4d5f6-tfngt   1/1     Running   0               10s
nginx-scaledobj-dd6c4d5f6-whxh7   1/1     Running   0               3m43s

Step 7: Application scale testing with KEDA with Amazon Managed Service for Prometheus

Let’s revisit application scaling using KEDA based on the architecture diagram. First, deploy the ADOT collector configuration:

eksctl create iamserviceaccount --name adot-collector --cluster $EKS_CLUSTER --attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess --approve 
export REMOTE_WRITE_URL="https://aps-workspaces.$AWS_REGION.amazonaws.com/workspaces/$AMP_WS_ID/api/v1/remote_write"

Now use the manifest file amp-eks-adot-prometheus-daemonset.yaml with scrape configuration to extract Envoy metrics and deploy the ADOT collector. This deploys an ADOT deployment to collect metrics from pods:

sed -i bak -e  "s|AWSREGION|${REGION}|g" amp-eks-adot-prometheus-daemonset.yaml
sed -i bak -e  "s|REMOTE_WRITE_URL|${REMOTE_WRITE_URL}|g" amp-eks-adot-prometheus-daemonset.yaml
kubectl apply -f amp-eks-adot-prometheus-daemonset.yaml

With the setup in place, deploy the sample application:

kubectl apply -f keda-app.yaml

This deploys a frontend application, checkout application and a downstream application as following. This sample app is based on this public AWS observability repository. It also creates associated services:

kubectl get deployments
NAME          READY   UP-TO-DATE   AVAILABLE   AGE
downstream   1/1     1            1           77s
frontend     1/1     1            1           77s
checkout     0/0     1            1           77s

kubectl get service
NAME          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
downstream    ClusterIP   10.100.214.180   <none>        80/TCP    118s
frontend      ClusterIP   10.100.207.248   <none>        80/TCP    118s
kubernetes    ClusterIP   10.100.0.1       <none>        443/TCP   21d

The preceding setup sends metrics including ho11y_total to Amazon Managed Service for Prometheus which increments on frontend invocations. Let’s verify event-driven scaling with KEDA by creating a scaled object:

kubectl apply -f keda-so-hpa.yaml

From the following scaled object code snippet, here we are using the ho11y_total query at the rate of 30 seconds, with a sample threshold say 0.25 i.e. It uses the ho11y_total metric queried every 30 secs with a threshold of 0.25.

triggers:
    - type: prometheus
      metadata:
        serverAddress: https://aps-workspaces.$AWS_REGION.amazonaws.com/workspaces/$AMP_WS_ID
        metricName: requests_rate
        identityOwner: operator
        threshold: '0.25'
        query: rate(ho11y_total[30s])

Adding load to the frontend application as following. The premise is: when the frontend gets more requests, checkout application pods gets scaled by KEDA by the ho11y_total metric and it should scale down once the load testing gets completed.

frontend_pod=`kubectl get pod --no-headers -l app=frontend -o 
     jsonpath='{.items[*].metadata.name}'`
loop_counter=0
while [ $loop_counter -le 300 ] ; do kubectl exec -it $frontend_pod — 
   curl downstream.default.svc.cluster.local; echo ; loop_counter=$[$loop_counter+1];done

After some time, the checkout pods scale up based on RPS load testing. After load testing completes, pods scale down. Verifying HPA events shows the scaling actions:

kubectl get hpa -w
NAME               REFERENCE       TARGETS MINPODS MAXPODS REPLICAS
keda-hpa-ho11y-hpa Deployment/checkout 445/250m (avg) 1 100 3 
keda-hpa-ho11y-hpa Deployment/checkout 640/250m (avg) 1 100 5 
keda-hpa-ho11y-hpa Deployment/checkout 500/250m (avg) 1 100 4 
keda-hpa-ho11y-hpa Deployment/checkout 320/250m (avg) 1 100 2 
keda-hpa-ho11y-hpa Deployment/checkout 100/250m (avg) 1 100 1

The scaling events can also be visualized in Amazon Managed Grafana for the ho11y_total/[30s]/0.25 metric, showing pods scaled from 0 to 5 and back down.

Figure 3. Grafana Visualization

In summary, we have demonstrated and validated event-driven auto-scaling with KEDA, driven by application metrics streamed to Prometheus.

Cleanup

Use the following commands to delete resources created during this post:

aws grafana delete-workspace --workspace-id $AMG_WS_ID
aws iam delete-role --role-name grafana-role
aws amp delete-workspace --workspace-id $AMP_WS_ID
eksctl delete cluster $EKS_CLUSTER

Conclusion

In this post, we walked through an application autoscaling solution on Amazon EKS utilizing KEDA and Amazon Managed Service for Prometheus. By integrating AWS managed services and open source software. We implemented an automated scaling solution tailored to the application’s performance metrics. The key components included Amazon EKS for the Kubernetes infrastructure, KEDA for scaling logic driven by metrics, Amazon Managed Service for Prometheus as the metrics backend, ADOT for ingesting metrics, and Amazon Managed Grafana for visualization. Together, they formed a closed-loop system where application workload drove dynamic resource allocation.

We generated load against a sample microservice and observed KEDA automatically scaling pods up and down in response, as visible in Grafana charts. This demonstrated a metrics-based approach to right-sizing based on application metrics tailored to application needs. While our example focused on request rates, the framework can work with any custom metrics like CPU, latency, error codes and so on. As organizations aim to optimize cloud costs and performance, automating resource provisioning based on usage signals is key. KEDA integration with Amazon Managed Service for Prometheus provides a way to achieve event-driven autoscaling on EKS using Prometheus metrics.

To learn more about AWS Observability, see the following references:
• AWS Observability Best Practices Guide
• One Observability Workshop
• Terraform AWS Observability Accelerator
• CDK AWS Observability Accelerator

AWS Cloud Operations & Migrations Blog