AWS Cloud Operations Blog
Autoscaling Kubernetes workloads with KEDA using Amazon Managed Service for Prometheus metrics
Introduction
With the rising popularity of applications hosted on Amazon Elastic Kubernetes Service (Amazon EKS), a key challenge is handling increases in traffic and load efficiently. Traditionally, you would have to manually scale out your applications by adding more instances – an approach that’s time-consuming, inefficient, and prone to over or under provisioning. A better solution is to leverage autoscaling to automatically scale application instances based on real-time demand signals. Kubernetes Event-driven Autoscaling (KEDA) is an autoscaling tool for Kubernetes. KEDA allows scaling workloads based on metrics, events, and custom launch from sources like queues, databases, or monitoring systems. This enables precise matching of allocated resources to application load needs.
Amazon Managed Service for Prometheus provides a Prometheus-compatible metric monitoring solution for Amazon EKS clusters where key metrics are stored securely. You can leverage Amazon Managed Service for Prometheus and expose select application and business metrics to drive KEDA autoscaling decisions. In this blog post, we will demonstrate an automated scaling solution with KEDA and Amazon Managed Service for Prometheus on Amazon EKS. Specifically, we will configure KEDA to scale out a sample application deployment based on Requests Per Second (RPS) metrics from Amazon Managed Service for Prometheus. This delivers automated scaling perfectly sized to handle production workload demands. By following this guide, you can add flexible autoscaling to your own Amazon EKS workloads by leveraging KEDA’s integration with Amazon Managed Service for Prometheus. Amazon Managed Grafana is used here along with Amazon Managed Service for Prometheus to allow monitoring and analytics. Users can visualize the application metrics to gain insights into the auto-scaling patterns and correlate them with business events.
Prerequisites
You will need the following resources and tools for this walkthrough:
- AWS Command Line Interface (AWS CLI) version 2
- eksctl
- kubectl
- helm
- jq
- git
- Amazon Managed Service for Prometheus
- Amazon Managed Grafana
Solution Overview
This solution demonstrate several AWS integration with open source software to create an automated scaling pipeline. An Amazon EKS cluster provides the managed Kubernetes environment for deployment and orchestration. The AWS Distro for Open Telemetry (ADOT) is used to scrape the metrics from Amazon Managed Service for Prometheus and collect custom application metrics. To enable event-driven autoscaling, we install KEDA on Amazon EKS cluster. KEDA allows you to create scaling rules based on the metrics streams from Prometheus. Finally, Amazon Managed Grafana provides pre-built dashboards to visualize the scaling metrics from Amazon Managed Service for Prometheus and confirm that KEDA is properly autoscaling the microservice as load increases.
Bringing these components together, as shown in the following architecture diagram:
- KEDA components are deployed on Amazon EKS cluster to configure the platform for event-driven, metrics-based scaling.
- ADOT is configured to scrape the metrics from the Amazon Managed Service for Prometheus from our sample app like request rates and latency.
- KEDA ScaledObject configuration defines the autoscaling rules based on Prometheus metrics streams for the microservice deployment.
- Generate load traffic against our app. As traffic increases, we use the Amazon Managed Grafana dashboards.
- KEDA leverages the Prometheus metrics to automatically scale out the pods to match demand.
- Amazon Managed Grafana can be used to monitor the various metrics from Amazon Managed Service for Prometheus.
Figure 1. Architecture diagram
The flow of auto-scaling as in the following diagram, the microservice deployment works as follows. The end user sends requests to the microservice application running on Kubernetes. This application workload can fluctuate over time. The Amazon Distro for OpenTelemetry (ADOT) agent is configured as a sidecar on each microservice pod. ADOT collects metrics like request rate, latency, and error rate, and sends them to Amazon’s Managed Service for Prometheus, a managed time-series database optimized for large-scale, high-cardinality metrics. It offers high availability and elastic capabilities. KEDA queries the custom application metrics from Prometheus on a regular interval, which is the fixed, consistent time period between each query that KEDA makes to Prometheus to retrieve the custom application metrics for autoscaling purposes, defined in the configuration.
Based on the metrics data, KEDA determines if the pods need to be scaled up or down and interacts with the Horizontal Pod Autoscaler (HPA) to invoke auto-scaling actions. The HPA controller gets to that the desired number of pod replicas specified by KEDA are spun up or terminated gracefully to match the workload configurations. This provides automated scale up/down capabilities. In summary, ADOT, Amazon’s Managed Prometheus service, KEDA, and HPA work together to enable metrics-driven autoscaling with monitoring for Kubernetes microservices.
Figure 2. Sequence Diagram
By combining managed services like Amazon EKS, Amazon Managed Service for Prometheus, and Amazon Managed Grafana with open source components like KEDA and HPA, we’ve achieved automated application scaling governed by real time metrics. This provides a flexible and cloud native architecture that can scale production applications based on utilization indicators. The same model can drive scaling using metrics like CPU, memory, or application-specific metrics through Prometheus and KEDA’s integration.
Step 1: Setup the environment variables and artifacts
Get the required artifacts from this repository as following:
Step 2: Create an Amazon EKS Cluster
An Amazon EKS cluster can be created using the eksctl command line tool which provides a way to get started for basic cluster creation with sensible defaults as following.
or, the AWS Management Console provides a graphical interface to guide you through the process of creating EKS clusters.
Infrastructure as Code (IaC) tools like AWS CloudFormation or Terraform allow automating and managing Kubernetes clusters and all associated resources through code. This enables version control, reuse, and integration with CI/CD pipelines for production grade deployments. IaC best practices are recommended for manageability and repeatability across environments and accounts.
Step 3: Deploy KEDA
The ‘helm upgrade —install’ command is used to install the Kubernetes Event-driven Autoscaling (KEDA) component on the Kubernetes cluster.
The installation uses the official KEDA Helm chart (version 2.13.1) from the kedacore repo, installed with release name ‘keda’ in namespace ‘keda’ (created automatically if needed). The Helm chart contains manifests to deploy KEDA’s custom resource definitions, RBAC components, Prometheus metrics adapter, and the KEDA controller pod. It runs with defaults suitable for most use cases. Configuration can be customized as needed. Using Helm manages the KEDA deployment lifecycle including deployment, versioning, upgrades, and rollbacks.
Step 4: Create Amazon Managed Service for Prometheus Workspace
The ‘aws amp create-workspace’ command creates an Amazon Managed Service for Prometheus workspace with the alias ‘AMP-KEDA’ in the specified AWS region. The workspaces provide isolated environments for storing Prometheus metrics and dashboards. The workspace is created with default settings which can be further customized if needed. The call returns the ID of the newly created workspace like ws-1f649a35-5f44-4744-885e-95a4844cba68. This ID is required for sending metrics data to the workspace from applications as well as for allowing other services to access the data.
Step 5: Create Amazon Managed Grafana workspace
When creating an Amazon Managed Grafana workspace using the AWS CLI, you must first create an IAM Role. This role will be assigned to the workspace and the IAM permissions will be used to access any AWS data sources.
Next, create the Amazon Managed Grafana workspace configured to use AWS Single Sign-On (AWS SSO) for authentication and the IAM role we created preceding. The commands following will help create workspace
Step 6: Synthetic testing of KEDA scaling with Amazon Managed Service for Prometheus
The setup is now complete, we will perform synthetic testing. For this, we’re deploying a Kubernetes deployment of the nginx image. This will run a single pod with the nginx image:
After this, we deploy the KEDA scaled object. As seen in the configuration snippet, the scaled object queries a custom metric vector (100) which returns a constant value of 100. The threshold is set to 25 which will scale the deployment to query result/threshold = 4 pods
When checking the HPA, it shows the target metric is 25/25 (avg) and has scaled to 4 replicas
This scaling event is also evident from the KEDA controller logs:
Finally, checking the pod status confirms there are now 4 nginx pods running:
Step 7: Application scale testing with KEDA with Amazon Managed Service for Prometheus
Let’s revisit application scaling using KEDA based on the architecture diagram. First, deploy the ADOT collector configuration:
Now use the manifest file amp-eks-adot-prometheus-daemonset.yaml with scrape configuration to extract Envoy metrics and deploy the ADOT collector. This deploys an ADOT deployment to collect metrics from pods:
With the setup in place, deploy the sample application:
This deploys a frontend application, checkout application and a downstream application as following. This sample app is based on this public AWS observability repository. It also creates associated services:
The preceding setup sends metrics including ho11y_total to Amazon Managed Service for Prometheus which increments on frontend invocations. Let’s verify event-driven scaling with KEDA by creating a scaled object:
From the following scaled object code snippet, here we are using the ho11y_total query at the rate of 30 seconds, with a sample threshold say 0.25 i.e. It uses the ho11y_total metric queried every 30 secs with a threshold of 0.25.
Adding load to the frontend application as following. The premise is: when the frontend gets more requests, checkout application pods gets scaled by KEDA by the ho11y_total metric and it should scale down once the load testing gets completed.
After some time, the checkout pods scale up based on RPS load testing. After load testing completes, pods scale down. Verifying HPA events shows the scaling actions:
The scaling events can also be visualized in Amazon Managed Grafana for the ho11y_total/[30s]/0.25 metric, showing pods scaled from 0 to 5 and back down.
Figure 3. Grafana Visualization
In summary, we have demonstrated and validated event-driven auto-scaling with KEDA, driven by application metrics streamed to Prometheus.
Cleanup
Use the following commands to delete resources created during this post:
Conclusion
In this post, we walked through an application autoscaling solution on Amazon EKS utilizing KEDA and Amazon Managed Service for Prometheus. By integrating AWS managed services and open source software. We implemented an automated scaling solution tailored to the application’s performance metrics. The key components included Amazon EKS for the Kubernetes infrastructure, KEDA for scaling logic driven by metrics, Amazon Managed Service for Prometheus as the metrics backend, ADOT for ingesting metrics, and Amazon Managed Grafana for visualization. Together, they formed a closed-loop system where application workload drove dynamic resource allocation.
We generated load against a sample microservice and observed KEDA automatically scaling pods up and down in response, as visible in Grafana charts. This demonstrated a metrics-based approach to right-sizing based on application metrics tailored to application needs. While our example focused on request rates, the framework can work with any custom metrics like CPU, latency, error codes and so on. As organizations aim to optimize cloud costs and performance, automating resource provisioning based on usage signals is key. KEDA integration with Amazon Managed Service for Prometheus provides a way to achieve event-driven autoscaling on EKS using Prometheus metrics.
To learn more about AWS Observability, see the following references:
• AWS Observability Best Practices Guide
• One Observability Workshop
• Terraform AWS Observability Accelerator
• CDK AWS Observability Accelerator