AWS Cloud Operations Blog
Using Prometheus Adapter to autoscale applications running on Amazon EKS
Automated scaling is an approach to scaling up or down workloads automatically based on resource usage. In Kubernetes, the Horizontal Pod Autoscaler (HPA) can scale pods based on observed CPU utilization and memory usage. In more complex scenarios, we would account for other metrics before deciding the scaling. For example, most web and mobile backends require automated scaling based on requests per second in order to handle traffic bursts. For ETL apps, automated scaling could be triggered by the job queue length exceeding a particular threshold, and so on. Instrumenting your applications with Prometheus and exposing the right metrics for autoscaling lets you fine-tune your apps to handle bursts better and ensure high availability.
Prometheus is an open-source monitoring and alerting toolkit that collects and stores its metrics as time series data. In other words, its metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels. Prometheus Adapter helps query and leverage custom metrics collected by Prometheus, and then utilizes them to make scaling decisions. These metrics are exposed by an API service and can be used readily by Horizontal Pod Autoscaler object.
Managing long-term Prometheus storage infrastructure is challenging. Therefore, in order to remove the heavy lifting of managing Prometheus, AWS launched Amazon Managed Service for Prometheus , a Prometheus-compatible monitoring service for container infrastructure and application metrics for containers that makes it easy to securely monitor container environments at scale. Amazon Managed Service for Prometheus automatically scales the ingestion, storage, alerting, and querying of operational metrics as workloads scale up and down.
This post will show how to utilize Prometheus Adapter to autoscale Amazon EKS Pods running an Amazon App Mesh workload. AWS App Mesh is a service mesh that makes it easy to monitor and control services. A service mesh is an infrastructure layer dedicated to handling service-to-service communication, usually through an array of lightweight network proxies deployed alongside the application code. We will be registering the custom metric via a Kubernetes API service that HPA will eventually use to make scaling decisions.
Prerequisites
You will need the following to complete the steps in this blog post:
- AWS CLI version 2
- eksctl
- kubectl
- jq
- helm
- An Amazon Managed Service for Prometheus workspace configured in your AWS account. For instructions, see Create a workspace in the Amazon Managed Service for Prometheus User Guide.
Create an Amazon EKS Cluster
Figure 1: Architecture diagram
We will create a custom metric for the counter exposed by envoy, which is the “envoy_cluster_upstream_rq
“. This can be extended to any custom metrics that the application emits.
First, create an Amazon EKS cluster enabled with AWS App Mesh for running the sample application. The eksctl CLI tool will deploy the cluster using the eks-cluster-config.yaml
file:
Execute the following command to create the EKS cluster:
eksctl create cluster -f eks-cluster-config.yaml
This creates an Amazon EKS cluster named AMP-EKS-CLUSTER
and a service account named appmesh-controller
that the AWS App Mesh controller will use for EKS.
Next, use the following commands to install the AppMesh controller.
First, get the Custom Resource Definitions (CRDs) in place:
Step 2: Deploy sample application and enable AWS App Mesh
To install an application and inject an envoy container, use the AWS App Mesh controller for Kubernetes that you created earlier. AWS App Mesh Controller for K8s manages App Mesh resources in your Kubernetes clusters. The controller is accompanied by CRDs that allow you to define AWS App Mesh components, such as meshes and virtual nodes, via the Kubernetes API just as you define native Kubernetes objects, such as deployments and services. These custom resources map to AWS App Mesh API objects that the controller manages for you. The controller watches these custom resources for changes and reflects them into the AWS App Mesh API.
Step 3: Create an Amazon Managed Service for Prometheus workspace
The Amazon Managed Service for Prometheus workspace ingests the Prometheus metrics collected from envoy. A workspace is a logical and isolated Prometheus server dedicated to Prometheus resources such as metrics. A workspace supports fine-grained access control for authorizing its management, such as update, list, describe, and delete, as well as ingesting and querying metrics.
aws amp create-workspace --alias AMP-APPMESH --region $AWS_REGION
Next, optionally create an interface VPC endpoint in order to securely access the managed service from resources deployed in your VPC. An Amazon Managed Service for Prometheus public endpoint is also available. This ensures that data ingested by the managed service won’t leave your AWS account VPC. Utilize the AWS CLI as shown here. Replace the placeholder strings, such as VPC_ID, AWS_REGION
, with your values.
Step 4: Scrape the metrics using AWS Distro for OpenTelemetry
Amazon Managed Service for Prometheus does not directly scrape operational metrics from containerized workloads in a Kubernetes cluster. You must deploy and manage a Prometheus server or an OpenTelemetry agent such as the AWS Distro for OpenTelemetry Collector or the Grafana Agent in order to perform this task. This post will walk you through the configuring of the AWS Distro for Open Telemetry (ADOT) in order to scrape the envoy metrics. The ADOT-AMP pipeline lets us use the ADOT Collector to scrape a Prometheus-instrumented application, and then send the scraped metrics to Amazon Managed Service for Prometheus.
This post will also walk you through the steps to configure an IAM role to send Prometheus metrics to Amazon Managed Service for Prometheus. We install the ADOT collector on the Amazon EKS cluster and forward metrics to Amazon Managed Service for Prometheus.
Configure permissions
We will be deploying the ADOT collector to run under the identity of a Kubernetes service account “amp-iamproxy-service-account”. With IAM roles for service accounts (IRSA), you can associate the AmazonPrometheusRemoteWriteAccess role with a Kubernetes service account, thereby providing IAM permissions to any pod utilizing the service account to ingest the metrics to Amazon Managed Service for Prometheus.
You need kubectl and eksctl CLI tools in order to run the script. They must be configured to access your Amazon EKS cluster.
Now create a manifest file, amp-eks-adot-prometheus-daemonset.yaml, with the scrape configuration in order to extract envoy metrics and deploy the ADOT collector. This example deploys a DaemonSet named adot-collector. The adot-collector DaemonSet collects metrics from pods on the cluster.
After the ADOT collector is deployed, it will collect the metrics and ingest them into the specified Amazon Managed Service for Prometheus workspace. The scrape configuration is similar to that of a Prometheus server. We will add the necessary configuration for scraping envoy metrics.
Step 5: Deploy the Prometheus Adapter to register custom metric
We will be creating a serviceaccount “monitoring” that will be used to run the Prometheus adapter. We will also be assigning the AmazonPrometheusQueryAccess permission using IRSA.
The Envoy sidecar utilized by AWS App Mesh exposes a counter envoy_cluster_upstream_rq_total
. You can configure the Prometheus adapter to transform this metric into req/sec rate. Below is the Prometheus adapter configuration information. The adapter will be connecting to the Amazon Managed Service for Prometheus’s query endpoint through sigv4 proxy.
We will now deploy the Prometheus adapter to create the custom metric:
We will create an API service so that our Prometheus adapter is accessible by Kubernetes API. Therefore, metrics can be fetched by our Horizontal Pod Autoscaler. We can query the custom metric API to see if the metric has been created.
Now you can use the appmesh_requests_per_second
metric in the HPA definition with the following HPA resource:
Now, we will be able to scale the pods when the threshold for the metric “appmesh_request_per_second” exceeds 10.
Let us add some load to experience the autoscaling actions:
Describing the HPA will show the scaling actions resulting from the load we introduced.
Clean-up
Use the following commands to delete resources created during this post:
Conclusion
This blog demonstrated how we can utilize Prometheus Adapter to autoscale deployments based on some custom metrics. For the sake of simplicity, we have only fetched one metric from AMP. However, the Adapter configmap can be extended to fetch some or all of the available metrics and utilize them for autoscaling.
Further Reading
- Getting Started with Amazon Managed Service for Prometheus
- Set up cross-region metrics collection for Amazon Managed Service for Prometheus workspaces
- Metrics collection from Amazon ECS using Amazon Managed Service for Prometheus
- AWS One Observability Demo Workshop: What’s new with Prometheus, Grafana, and OpenTelemetry