Autoscaling Amazon EKS services based on custom Prometheus metrics using CloudWatch Container Insights

Introduction

In a Kubernetes cluster, the Horizontal Pod Autoscaler can automatically scale the number of Pods in a Deployment based on observed CPU utilization and memory usage. The autoscaler depends on the Kubernetes metrics server, which collects resource metrics from Kubelets and exposes them in Kubernetes API server through Metrics API. The metrics server has to be installed as a separate add-on to a Kubernetes cluster. Kubernetes 1.6 added support for making use of application-provided custom metrics in the Horizontal Pod Autoscaler. Collecting custom metrics and exposing them to the autoscaler requires installing additional components that are provided by metrics solution vendors. Prometheus is an open source monitoring solution that has emerged as a great solution for collecting metrics from microservices running in Kubernetes. The custom metrics gathered by Prometheus can be exposed to the autoscaler using a Prometheus Adapter as outlined in the blog post titled Autoscaling EKS on Fargate with custom metrics.

Amazon CloudWatch monitors resources and applications that run on AWS in real time. CloudWatch can collect and track metrics, and can create alarms that watch metrics and send notifications or automatically make changes to the resources it is monitoring when a threshold is breached. Many AWS customers that are running containerized workloads on Amazon EKS want to use Amazon CloudWatch as a single pane of glass for all their monitoring, alerting, and autoscaling needs. CloudWatch Container Insights monitoring for Prometheus is a feature that was recently announced. It automates the discovery, collection, and aggregation of Prometheus metrics from applications running on Amazon EKS.

This blog post leverages this new feature and presents a solution that will enable customers to automatically scale their microservices deployed to an Amazon EKS cluster or a self-managed Kubernetes cluster on AWS, based on custom Prometheus metrics collected from the workloads. It employs a custom Kubernetes controller to manage Amazon CloudWatch metric alarms that watch custom metrics data and trigger scaling actions. AWS Lambda is used to autoscale the microservices. This approach can also be used for autoscaling based on metrics sent to CloudWatch by other services such as Amazon SQS.

Architecture

The architecture used to implement this autoscaling solution is comprised of the following elements:

A Kubernetes Operator implemented using Kubernetes Java SDK. This operator packages a custom resource named K8sMetricAlarm. It is defined by a CustomResourceDefinition, a custom controller implemented as a Deployment, which responds to events in the Kubernetes cluster pertaining to add/update/delete actions on the K8sMetricAlarm custom resource and Role/RoleBinding resources to grant necessary permissions to the custom controller. The customer controller runs under the identity of a Kubernetes service account that is associated with an IAM role that has permissions to manages resources in CloudWatch.
CloudWatch agent for Prometheus metrics collection, which is installed as a Deployment with a single replica in the Amazon EKS cluster.
Amazon CloudWatch metric alarms, which are managed by the custom controller in conjunction with the K8sMetricAlarm custom resource.
Amazon SNS topic, which is configured to receive notifications when a CloudWatch alarm breaches a specified threshold.
AWS Lambda function whose execution is triggered when a notification is sent to the Amazon SNS topic. The Lambda function acts as a Kubernetes client and performs the autoscaling operation on the target resource.
One or more microservices deployed to the cluster which are the targets of autoscaling. These services have been instrumented with Prometheus client library to collect application-specific metrics and expose them over HTTP, to be read by the CloudWatch agent for Prometheus.

Configuring CloudWatch Container Insights monitoring for Prometheus

CloudWatch Container Insights monitoring for Prometheus enables the collection and aggregation of Prometheus metrics from containerized microservices. Users must install the CloudWatch agent in their Kubernetes cluster to collect the metrics. The agent scrapes Prometheus metrics from containerized workloads and sends them to CloudWatch logs as performance log events using embedded metric format. From these log events, CloudWatch Container Insights can aggregate data at the cluster, node, pod, and service level of a Kubernetes cluster and create CloudWatch metrics. Metrics thus collected are charged as custom metrics. Refer to the documentation for Amazon CloudWatch pricing.

The YAML manifest for deploying the CloudWatch agent to a Kubernetes cluster contains two ConfigMaps, namely, prometheus-config, which contains standard Prometheus configurations that determine the set of microservices from which the agent scrapes metrics and prometheus-cwagentconfig, which configures how Prometheus metrics collected by the agent are converted into performance log events.

Prometheus stores all metrics data as time series. Every time series is uniquely identified by its name and an optional set of key-value pairs called labels. The prometheus-config ConfigMap used by the current implementation is shown below. It configures the CloudWatch agent to scrape Prometheus metrics from Pods in the java namespace. It also sets rules for adding the label job=web-services as well as relabeling certain meta labels in every time series scraped from these Pods.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: cloudwatch
data:
  prometheus.yaml: |
    global:
      scrape_interval: 10s
      scrape_timeout: 10s
    scrape_configs:
    - job_name: 'web-services'
      sample_limit: 10000
      metrics_path: /metrics
      kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
            - java
      relabel_configs:
      - action: keep
        regex: true        
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape
      - action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_pod_annotation_prometheus_io_port
        target_label: __address__        
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: EKS_Namespace
      - action: replace
        source_labels:
        - app
        target_label: EKS_Deployment        
      - action: replace
        source_labels: 
        - __meta_kubernetes_pod_name
        target_label: EKS_Pod
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_container_name
        target_label: EKS_Container

The prometheus-cwagentconfig ConfigMap used by the current implementation is shown below. It configures the CloudWatch agent to generate performance log events from metrics data in a Prometheus time series named http_requests_total with labels job=web-services and app=recommender-app. It also specifies the list of labels, namely, ClusterName, EKS_Namespace, and EKS_Deployment to be used as CloudWatch dimensions for each selected metric. Note that with the exception of ClusterName, which is implicitly defined, all other dimensions listed for a CloudWatch metric must correspond to a label in the Prometheus time series.

---
# configmap for prometheus cloudwatch agent
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-cwagentconfig
  namespace: cloudwatch
data:
  cwagentconfig.json: |
    {
      "logs": {
        "metrics_collected": {
          "prometheus": {
            "prometheus_config_path": "/etc/prometheusconfig/prometheus.yaml",
            "emf_processor": {
              "metric_declaration": [
                {
                  "source_labels": ["job", "app"],
                  "label_matcher": "^web-services;recommender-app$",
                  "dimensions": [["ClusterName","EKS_Namespace","EKS_Deployment"]],
                  "metric_selectors": [
                    "^http_requests_total$"
                  ]
                }
              ]
            }
          }
        },
        "force_flush_interval": 5
      }

With the above configuration settings, a performance log event sent to CloudWatch logs by the agent is shown below. This log event is used to generate data for a custom CloudWatch metric named http_requests_total in the CloudWatch namespace named ContainerInsights/Prometheus with the three dimensions listed in the log event. The current implementation has chosen these dimensions so that CloudWatch can aggregate metrics data collected from all Pods that belong to a specific Deployment within a Kubernetes namespace.

{
   "CloudWatchMetrics":[
      {
         "Metrics":[
            {
               "Name":"http_requests_total"
            }
         ],
         "Dimensions":[
            [
               "ClusterName",
               "EKS_Deployment",
               "EKS_Namespace"
            ]
         ],
         "Namespace":"ContainerInsights/Prometheus"
      }
   ],
   "ClusterName":"k8s-sarathy-cluster",
   "EKS_Container":"jvm8",
   "EKS_Deployment":"recommender-app",
   "EKS_Namespace":"java",
   "EKS_Pod":"recommender-app-7ff98db478-qx8qc",
   "Timestamp":"1600965772981",
   "Version":"0",
   "app":"recommender-app",
   "exported_job":"recommender",
   "http_requests_total":20,
   "instance":"10.0.11.21:8080",
   "job":"web-services",
   "path":"/popular/category",
   "pod_template_hash":"7ff98db478",
   "prom_metric_type":"counter",
   "role":"backend-service"
}

Custom Kubernetes controller for managing CloudWatch alarms

A custom controller implemented using Kubernetes Java SDK is used to manage the creation/deletion of CloudWatch metric alarms in conjunction with a Kubernetes custom resource named K8sMetricAlarm. The CustomResourceDefinition, which defines the schema for this resource, is shown below.

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: k8smetricalarms.containerinsights.eks.com
spec:
  group: containerinsights.eks.com
  version: v1
  versions:
    - name: v1
      served: true
      storage: true
  scope: Namespaced
  names:
    kind: K8sMetricAlarm
    plural: k8smetricalarms
    singular: k8smetricalarm
    shortNames:
      - k8sma
  preserveUnknownFields: false
  validation:
    openAPIV3Schema:
      type: object
      properties:
        spec:
          type: object
          properties:
            minReplicas:
              type: integer
            maxReplicas:
              type: integer
            scaleUpBehavior:
              type: object
              properties:
                coolDown:
                  type: integer                
                policies:
                  type: array
                  items:
                    type: object
                    properties:
                      type:
                        type: string
                      value:
                        type: integer
            scaleDownBehavior:
              type: object
              properties:
                coolDown:
                  type: integer
                policies:
                  type: array
                  items:
                    type: object
                    properties:
                      type:
                        type: string
                      value:
                        type: integer
            deployment:
              type: string
            scaleUpAlarmConfig:
              type: string
            scaleDownAlarmConfig:
              type: string

Configuring CloudWatch alarms and scaling behavior

The K8sMetricAlarm custom resource contains the following fields:

Fields named scaleUpAlarmConfig and scaleDownAlarmConfig that contain JSON data that configures the CloudWatch alarms, which trigger scaling actions in both directions. The custom controller uses this data to create alarms using PutMetricAlarm API. The structure of this JSON data is based on the output of the AWS CLI command aws cloudwatch put-metric-alarm –generate-cli-skeleton. The JSON data must contain a set of tags that specify the name and namespace of the K8sMetricAlarm resource thus establishing a link between a CloudWatch alarm and the corresponding Kubernetes custom resource.
Fields named scaleUpBehavior and scaleDownBehavior, which allow scaling behavior to be configured. This concept is borrowed from Kubernetes API v1.18, which adds scale velocity configuration parameters to the Horizontal Pod Autoscaler to control the rate of scaling. The settings allow autoscaling to be performed based either on a specified number of Pods or on a percentage of the number of Pods currently deployed.
Fields named maxReplicas and minReplicas, which specify the upper and lower bounds for the number of Pods after a scaling operation.
Filed name deployment, which is the name of a Deployment resource within the same Kubernetes namespace that is the target of autoscaling.

The custom controller enables users to declaratively manage a CloudWatch metric alarm, which is an AWS native resource, using the same client tools such as kubectl or helm used for deploying microservices to a Kubernetes cluster. A representative YAML manifest for K8sMetricAlarm is shown below.

---
apiVersion: containerinsights.eks.com/v1
kind: K8sMetricAlarm
metadata:
  namespace: java
  name: http-request-rate
spec:
  minReplicas: 4
  maxReplicas: 10
  deployment: recommender-app
  scaleUpBehavior:
    coolDown: 300
    policies:
      - type: Pods
        value: 2
      - type: Percent
        value: 50
  scaleDownBehavior:
    coolDown: 300
    policies:
      - type: Pods
        value: 1
      - type: Percent
        value: 25
  scaleUpAlarmConfig: |-
    {
        "AlarmName":"HTTP-Request-Rate-High-Alarm",
        "AlarmDescription":"Alarm triggered when the rate of HTTP requests exceeds 10 requests/second",
        "ActionsEnabled": true,
        "OKActions": [],
        "AlarmActions": [
            "arn:aws:sns:us-east-1:937351930975:CloudWatchAlarmTopic"
        ],
        "InsufficientDataActions": [],
        "EvaluationPeriods":5,
        "DatapointsToAlarm":2,
        "Threshold":10,
        "ComparisonOperator":"GreaterThanOrEqualToThreshold",
        "Metrics": [
            {
                "Id": "m1",
                "Label": "sum_http_requests_total_1m",
                "ReturnData": false,
                "MetricStat": {
                    "Metric": {
                        "Namespace": "ContainerInsights/Prometheus",
                        "MetricName": "http_requests_total",
                        "Dimensions": [
                            {
                                "Name": "ClusterName",
                                "Value": "k8s-sarathy-cluster"
                            },
                            {
                                "Name": "EKS_Namespace",
                                "Value": "java"
                            },
                            {
                                "Name": "EKS_Deployment",
                                "Value": "recommender-app"
                            }                            
                        ]
                    },
                    "Period": 60,
                    "Stat": "Sum"
                }
            },
            {
                "Id": "m2",
                "Expression": "m1/60",
                "Label": "rate_http_requests_total_1m",
                "ReturnData": true,
                "Period": 60
            }        
        ],
        "Tags": [
            {
                "Key": "kubernetes-name",
                "Value": "http-request-rate"
            },
            {
                "Key": "kubernetes-namespace",
                "Value": "java"
            }        
        ]        
     }      
  scaleDownAlarmConfig: |-
    {
        "AlarmName":"HTTP-Request-Rate-Low-Alarm",
        "AlarmDescription":"Alarm triggered when the rate of HTTP requests falls below 10 requests/second",
        "ActionsEnabled": true,
        "OKActions": [],
        "AlarmActions": [
            "arn:aws:sns:us-east-1:937351930975:CloudWatchAlarmTopic"
        ],
        "InsufficientDataActions": [],
        "EvaluationPeriods":5,
        "DatapointsToAlarm":2,
        "Threshold":5,
        "ComparisonOperator":"LessThanOrEqualToThreshold",
        "Metrics": [
            {
                "Id": "m1",
                "Label": "sum_http_requests_total_1m",
                "ReturnData": false,
                "MetricStat": {
                    "Metric": {
                        "Namespace": "ContainerInsights/Prometheus",
                        "MetricName": "http_requests_total",
                        "Dimensions": [
                            {
                                "Name": "ClusterName",
                                "Value": "k8s-sarathy-cluster"
                            },
                            {
                                "Name": "EKS_Namespace",
                                "Value": "java"
                            },
                            {
                                "Name": "EKS_Deployment",
                                "Value": "recommender-app"
                            }                            
                        ]
                    },
                    "Period": 60,
                    "Stat": "Sum"
                }
            },
            {
                "Id": "m2",
                "Expression": "m1/60",
                "Label": "rate_http_requests_total_1m",
                "ReturnData": true,
                "Period": 60
            }        
        ],
        "Tags": [
            {
                "Key": "kubernetes-name",
                "Value": "http-request-rate"
            },
            {
                "Key": "kubernetes-namespace",
                "Value": "java"
            }        
        ]        
     }

Amazon CloudWatch allows the use of metric math expressions to query multiple CloudWatch metrics and to create new time series based on these metrics. This enables users to setup alarms based on metrics that are derived from the custom Prometheus metrics scraped from their microservices.

The JSON configuration shown above creates CloudWatch alarms based on the values in a time series named rate_http_requests_total_1m, which represents the number of HTTP requests per second over a trailing 1 minute period. This is derived from another time series named sum_http_requests_total_1m, which in turn is calculated as the sum all values in a custom CloudWatch metric named http_requests_total over a trailing 1 minute period. In Prometheus Query Language, this is equivalent to the query rate (http_requests_total{Deployment=“recommender-app”, Namespace=”java”, ClusterName=”k8s-sarathy-cluster”}[1m]).

With the above scale up configuration, if the metric rate_http_requests_total_1m exceeds the threshold value of 10 over 2 separate periods, each 1 minute long within a 5 minute window, then a notification is sent to the SNS topic named CloudWatchAlarmTopic. The scale down configuration results in a similar behavior when the metric goes below the threshold value of 5.

Scaling based on metrics from other AWS services

This approach to autoscaling is not limited to service metrics collected from microservices using CloudWatch agent for Prometheus. It can be leveraged for autoscaling based on metrics sent to CloudWatch by other AWS services as long as the CloudWatch metric alarms are set up using the declarative semantics outlined above. For example, the following definition for a K8sMetricAlarm resource enables autoscaling a microservice using a metric named ApproximateNumberOfMessagesVisible, which is one of the available CloudWatch metrics for Amazon SQS.

---
apiVersion: containerinsights.eks.com/v1
kind: K8sMetricAlarm
metadata:
  namespace: java
  name: sqs-messages-visible
spec:
  minReplicas: 4
  maxReplicas: 10
  deployment: recommender-app
  scaleUpBehavior:
    coolDown: 150
    policies:
      - type: Pods
        value: 3
      - type: Percent
        value: 50
  scaleUpAlarmConfig: |-
    {
        "AlarmName":"SQS-Messages-High-Alarm",
        "AlarmDescription":"Alarm triggered when the approximate number of messages in a SQS queue exceeds 10",
        "ActionsEnabled": true,
        "OKActions": [],
        "AlarmActions": [
            "arn:aws:sns:us-east-1:937351930975:CloudWatchAlarmTopic"
        ],
        "InsufficientDataActions": [],
        "EvaluationPeriods":5,
        "DatapointsToAlarm":2,
        "Threshold":100,
        "ComparisonOperator":"GreaterThanOrEqualToThreshold",
        "Metrics": [
            {
                "Id": "m1",
                "Label": "avg_messages_visible",
                "ReturnData": true,
                "MetricStat": {
                    "Metric": {
                        "Namespace": "AWS/SQS",
                        "MetricName": "ApproximateNumberOfMessagesVisible",
                        "Dimensions": [
                            {
                                "Name": "QueueName",
                                "Value": "TestQueue"
                            }
                        ]
                    },
                    "Period": 60,
                    "Stat": "Average"
                }
            }
        ],
        "Tags": [
            {
                "Key": "kubernetes-name",
                "Value": "sqs-messages-visible"
            },
            {
                "Key": "kubernetes-namespace",
                "Value": "java"
            }        
        ]        
     }

Configuring scaling cooldown

The scaling behavior settings in the K8sMetricAlarm custom resource contain a coolDown parameter that specifies the minimum duration between two successive scaling events. The JSON configuration data for the metric alarm contains three parameters, Period, EvaluationPeriods, and DatapointsToAlarm, that enable CloudWatch to evaluate when to change the alarm state. Together, these parameters can be used effectively to help mitigate the potential problem of frequent fluctuations in the number of replicas due to dynamic nature of metrics evaluated. As the scaling operation is performed by AWS Lambda, it preserves state about when the last scaling event occurred by saving relevant information as annotations in the Deployment resource as shown below.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    cloudwatch.alarm.name: HTTP-Request-Rate-Low-Alarm
    cloudwatch.alarm.trigger.time: "2020-10-04T19:54:10.483Z"
spec:

The code snippet below shows how the JSON configuration data is used by the custom controller to create the CloudWatch metric alarms in response to the creation of K8sMetricAlarm custom resource. Refer to the controller source code from the Git repository for complete implementation details.

@KubernetesReconciler(
		value = "k8sMetricAlarmController", 
		workerCount = 2,
		watches = @KubernetesReconcilerWatches({
			@KubernetesReconcilerWatch(
					workQueueKeyFunc = WorkQueueKeyFunFactory.KubernetesMetricAlarmCustomObjectWorkQueueKeyFunc.class,
					apiTypeClass = K8sMetricAlarmCustomObject.class, 
					resyncPeriodMillis = 60*1000L)
			}))
public class K8sMetricAlarmReconciler implements Reconciler {	
	final AmazonCloudWatch cloudWatchClient = AmazonCloudWatchClientBuilder.defaultClient();
	private void createCloudWatchAlarm (JsonObject config) throws Exception {		
		List<Tag> tags = new ArrayList<Tag> ();
		if (config.containsKey("Tags")) {
			JsonArray tagsArray = config.getJsonArray("Tags");
			for (int i = 0; i < tagsArray.size(); i++) {
				JsonObject tagObject = tagsArray.getJsonObject(i);
				tags.add(new Tag()
						.withKey(tagObject.getString("Key"))
						.withValue(tagObject.getString("Value")));
			}
		}
		else {
			throw new Exception ("Cannot create CloudWatch Alarm without specifying tags");
		}
		
		if (config.containsKey("Metrics")) {
			List<MetricDataQuery> metricsDataQueryCollection = new ArrayList<MetricDataQuery>();
			JsonArray metricsArray = config.getJsonArray("Metrics");
			for (int i = 0; i < metricsArray.size(); i++) {
				
				JsonObject metricsDataQueryObject = metricsArray.getJsonObject(i);
				
				if (metricsDataQueryObject.containsKey("MetricStat")) {
					JsonObject metricStatObject = metricsDataQueryObject.getJsonObject("MetricStat");
					JsonObject metricObject = metricStatObject.getJsonObject("Metric");
					JsonArray dimensionsArray = metricObject.getJsonArray("Dimensions");
					
					List<Dimension> dimensions = new ArrayList<Dimension> ();
					for (int d = 0; d < dimensionsArray.size(); d++) {
						JsonObject dimensionObject = dimensionsArray.getJsonObject(d);
						dimensions.add(new Dimension()
								.withName(dimensionObject.getString("Name"))
								.withValue(dimensionObject.getString("Value")));
					}

					Metric metric = new Metric()
							.withMetricName(metricObject.getString("MetricName"))
							.withNamespace(metricObject.getString("Namespace"))
							.withDimensions(dimensions);
					
					MetricStat metricStat = new MetricStat()
							.withMetric(metric)
							.withPeriod(metricStatObject.getInteger("Period"))
							.withStat(metricStatObject.getString("Stat"));
					
					MetricDataQuery metricDataQuery = new MetricDataQuery()
							.withId(metricsDataQueryObject.getString("Id"))
							.withLabel(metricsDataQueryObject.getString("Label"))
							.withMetricStat(metricStat)				
							.withReturnData(metricsDataQueryObject.getBoolean("ReturnData"));
					
					metricsDataQueryCollection.add(metricDataQuery);
				}
				else if (metricsDataQueryObject.containsKey("Expression")) {
					
					MetricDataQuery metricDataQuery = new MetricDataQuery()
							.withId(metricsDataQueryObject.getString("Id"))
							.withLabel(metricsDataQueryObject.getString("Label"))
							.withPeriod(metricsDataQueryObject.getInteger("Period"))
							.withExpression(metricsDataQueryObject.getString("Expression"))
							.withReturnData(metricsDataQueryObject.getBoolean("ReturnData"));
					
					metricsDataQueryCollection.add(metricDataQuery);
				}
			}
			
			JsonArray alarmActionsArray = config.getJsonArray("AlarmActions");
			List<String> alarmActions = new ArrayList<String>();
			for (int j = 0; j < alarmActionsArray.size(); j++) {
				alarmActions.add(alarmActionsArray.getString(j));
			}
			
			PutMetricAlarmRequest request = new PutMetricAlarmRequest()
				    .withAlarmName(config.getString("AlarmName"))
				    .withAlarmDescription(config.getString("AlarmDescription"))
				    .withActionsEnabled(config.getBoolean("ActionsEnabled"))
				    .withAlarmActions(alarmActions)
				    .withEvaluationPeriods(config.getInteger("EvaluationPeriods"))
				    .withDatapointsToAlarm(config.getInteger("DatapointsToAlarm"))
				    .withThreshold(config.getDouble("Threshold"))
				    .withComparisonOperator(config.getString("ComparisonOperator"))
				    .withMetrics(metricsDataQueryCollection)
				    .withTags(tags);
				  
			
			PutMetricAlarmResult response = cloudWatchClient.putMetricAlarm(request);
			logger.info(String.format("Successfully created CloudWatch Metric Alarm '%s'", config.getString("AlarmName")));
		}
	}
}

Autoscaling Amazon EKS resources with AWS Lambda

A Lambda function is triggered to execute when CloudWatch alarm sends notifications to the SNS topic. It performs the role of a Kubernetes client and executes autoscaling operations by invoking the Kubernetes API server. It authenticates with the API server using a token generated with the AWS Signature Version 4 algorithm, adopting the same scheme used by the AWS IAM Authenticator for Kubernetes to construct the authentication token.

The Message field in the SNS notification contains JSON data shown below. The Lambda function determines if it should scale up or scale down the corresponding microservice based on the value of the Trigger.ComparisonOperator field.

Upon receiving the notification, the Lambda function executes the following operations.

Retrieve the tags associated with the CloudWatch alarm using the ListTagsForResource API. The tags contain the name and namespace of the corresponding K8sMetricAlarm custom resource.
Retrieve details of the K8sMetricAlarm custom resource from the Kubernetes API server. The deployment field of this custom resource has the name of the Kubernetes Deployment to be autoscaled.
Retrieve details of the Deployment resource from the Kubernetes API server.
Determine the number of Pods that the Deployment should be scaled up or scaled down to. The scaling behaviors configured in the custom resource are used in determining this target value based on the logic outlined in the Kubernetes documentation under Support for configurable scaling behavior.
Execute the scaling operation by invoking the Kubernetes API server. After completing the scaling operation, the Lambda function resets the alarm status of the CloudWatch alarm so that it can trigger subsequent scaling events if necessary.

In order for the Lambda function to interact with the Kubernetes API server and perform the above operations, the following configuration settings are required. Refer to the configuration files from the Git repository for complete details.

API server endpoint URL and Amazon EKS cluster certificate authority data.
An IAM role mapped to a Kubernetes group (e.g. lambda-client) in the mapRoles section of the aws-auth ConfigMap in the Amazon EKS cluster. The Lambda function generates the EKS authentication token using the temporary credentials granted to this IAM role. This IAM role does not need to be attached to any IAM permission policies.
A Role and RoleBinding definition in the Amazon EKS cluster that grants the lambda-client Kubernetes group permission to list K8sMetricAlarm custom resources as well as list/update Deployment resources.
The credentials of an IAM user who, adhering to the least privilege security guidelines, is granted only the permission to assume the above IAM role.
An IAM role assigned as the function’s execution role which is granted permissions to list resources in CloudWatch.

Shown below is a code snippet from the request handler that implements autoscaling.

public class CloudWatchAlarmHandler implements RequestHandler<SNSEvent, Object> {
	final AmazonCloudWatch cloudWatchClient = AmazonCloudWatchClientBuilder.defaultClient();
	private ApiClient apiClient = null;
	private GenericKubernetesApi<K8sMetricAlarmCustomObject, K8sMetricAlarmCustomObjectList> apiCloudWatchAlarm = null;
	private GenericKubernetesApi<V1Deployment, V1DeploymentList> apiDeployment = null;
	
	public void initialize () {
		try {
			logger.info("Intializing the API client");
			apiClient = CustomClientBuilder.custom();
			
			this.apiCloudWatchAlarm = new GenericKubernetesApi<K8sMetricAlarmCustomObject, K8sMetricAlarmCustomObjectList>(
					K8sMetricAlarmCustomObject.class, 
					K8sMetricAlarmCustomObjectList.class,
					"containerinsights.eks.com", 
					"v1", 
					"k8smetricalarms", 
					apiClient);
			
			this.apiDeployment = new GenericKubernetesApi<V1Deployment, V1DeploymentList>(
					V1Deployment.class, 
					V1DeploymentList.class,
					"apps", 
					"v1", 
					"deployments", 
					apiClient);
			}
		catch (Exception ex) {
			logger.error("Exception initializating the Kubernetes API client", ex);
		}
	}

	private void processCloudWatchAlarmMessage (JsonObject alarmMessageObject) {
		String alarmName = alarmMessageObject.getString("AlarmName");
		String accountID = alarmMessageObject.getString("AWSAccountId");
		String alarmTriggerReason = alarmMessageObject.getString("NewStateReason");
		String alarmArn = String.format("arn:aws:cloudwatch:%s:%s:alarm:%s", AWS_REGION, accountID, alarmName);
		ComparisonOperator operator = Enum.valueOf(ComparisonOperator.class, alarmMessageObject.getJsonObject("Trigger").getString("ComparisonOperator"));
		
		logger.info(String.format("Alarm ARN = %s", alarmArn));
		logger.info(String.format("Reason for Trigger = %s", alarmTriggerReason));
		
		//
		// Get the name/namespace of the K8sMetricAlarm custom resource from the tags associated with the CloudWatch alarm
		//
		ListTagsForResourceRequest request = new ListTagsForResourceRequest().withResourceARN(alarmArn);
		ListTagsForResourceResult response = cloudWatchClient.listTagsForResource(request);
		List<Tag> tags = response.getTags();
		String resourceName = null;
		String resoueceNamespace = null;
		for (Tag t : tags) {
			switch (t.getKey()) {
				case K8S_NAME:
					resourceName = t.getValue();
					break;
				case K8S_NAMESPACE:
					resoueceNamespace = t.getValue();
					break;
				default:
					break;
			}
		}
		if (resourceName == null || resoueceNamespace == null) {
			logger.error(String.format("Unable to identify the Kubernetes name and namespace of the K8sMetricAlarm custom resource for alarm '%s'", alarmName));
			return;
		}
		
		//
		// Fetch the K8sMetricAlarm custom resource from the API server
		// The custom resource contains the name of the Deployment resource to be scaled
		//
		logger.info(String.format("Retrieving K8sMetricAlarm custom resource '%s.%s'", resoueceNamespace, resourceName));
		K8sMetricAlarmCustomObject cloudWatchAlarm = apiCloudWatchAlarm.get(resoueceNamespace, resourceName).getObject();
		String alarmStateResetReason;
		if (cloudWatchAlarm != null) {
			K8sMetricAlarmCustomObjectSpec cloudWatchAlarmSpec = cloudWatchAlarm.getSpec();
			int minReplicas = cloudWatchAlarmSpec.getMinReplicas();
			int maxReplicas = cloudWatchAlarmSpec.getMaxReplicas();
			ScalingBehavior scaleUpBehavior = cloudWatchAlarmSpec.getScaleUpBehavior();
			ScalingBehavior scaleDownBehavior = cloudWatchAlarmSpec.getScaleDownBehavior();
			String deploymentName = cloudWatchAlarmSpec.getDeployment();

			//
			// Fetch the Deployment resource from the API server
			// Compute the number of replicas to be scaled up or down based on scaling policies
			// Update the Deployment resource with the new number of replicas.
			//
			logger.info(String.format("Retrieving Deployment resource '%s.%s'", resoueceNamespace, deploymentName));
			V1Deployment deployment = apiDeployment.get(resoueceNamespace, deploymentName).getObject();
			V1ObjectMeta metadata = deployment.getMetadata();
			boolean isCoolingDown = isResourceCoolingDown (metadata, operator, scaleUpBehavior, scaleDownBehavior);
			if (isCoolingDown) {
				alarmStateResetReason = String.format("Deployment '%s.%s' is still cooling down. Suspending further scaling", deploymentName, resoueceNamespace);
				logger.info(alarmStateResetReason);
			}
			else {
				int replicas = deployment.getSpec().getReplicas();
				int scaledReplicas = computeScaling(operator, minReplicas, maxReplicas, replicas, scaleUpBehavior, scaleDownBehavior);
				updateDeployment(deployment, metadata, replicas, scaledReplicas, alarmName, alarmTriggerReason);
				alarmStateResetReason = String.format("Scaled Deployment '%s.%s' from %d to %d replicas", resoueceNamespace, deploymentName, replicas, scaledReplicas);
			}
		} else {
			alarmStateResetReason = String.format("Unable to retrieve K8sMetricAlarm custom resource '%s.%s'", resoueceNamespace, resourceName);
			logger.error(alarmStateResetReason);
		}
		
		//
		// After the scaling activity is completed/suspended, set the alarm status to OK
		//
		SetAlarmStateRequest setStateRequest = new SetAlarmStateRequest()
				.withAlarmName(alarmName)
				.withStateReason(alarmStateResetReason)
				.withStateValue(StateValue.OK);
		cloudWatchClient.setAlarmState(setStateRequest);
		logger.info(String.format("State of alarm '%s' set to %s", alarmName, StateValue.OK.toString()));
	}
}

Autoscaling in action

Let’s use this solution to autoscale a microservice deployed to an Amazon EKS cluster. The application has been instrumented with Prometheus client library. It tracks the number of incoming HTTP requests using a Prometheus Counter named http_requests_total and exposes this data over HTTP at the endpoint /metrics. Invoking this endpoint gives the following output, which can be read by the Prometheus server or Prometheus-compatible scrapers like the CloudWatch agent for Prometheus.

# HELP http_requests_total Total number of HTTP requests.
# TYPE http_requests_total counter
http_requests_total{job="recommender",path="/live",} 278.0
http_requests_total{job="recommender",path="/popular/product",} 159.0
http_requests_total{job="recommender",path="/popular/category",} 173.0

To begin with, the service is deployed with 2 replicas. The current implementation employs Postman Collection Runner as the load generator. It is used to produce an initial load of around 8 HTTP requests/second. This is seen by graphing the CloudWatch custom metic generated from above Prometheus metrics data.

The load on the service is increased significantly to push the request rate above the threshold value of 10 requests/second. The figure below shows a snapshot of the CloudWatch alarm named HTTP-Request-Rate-High-Alarm just around the time it breaches the threshold.

When the alarm breaches the threshold, we can see from the sample output logs of the Lambda function, shown below, that the first scale up event occurs. As the request rate continues to hover above the threshold as seen in the figure above, additional scale out events take place and the service is ultimately scaled out to 10 replicas, which is the upper bound set by the scaling behavior. We can also see from the logs how the cooldown configuration spreads out successive scaling events as well as how the velocity of scaling is impacted by the scaling behavior configured in the K8sMetricAlarm custom resource.

2020-10-04T14:31:47.247-04:00	3679 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Alarm ARN = arn:aws:cloudwatch:us-east-1:937351930975:alarm:HTTP-Request-Rate-High-Alarm
2020-10-04T14:31:47.247-04:00	3679 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Reason for Trigger = Threshold Crossed: 2 out of the last 5 datapoints [14.883333333333333 (04/10/20 18:30:00), 11.2 (04/10/20 18:29:00)] were greater than or equal to the threshold (10.0) (minimum 2 datapoints for OK -> ALARM transition).
2020-10-04T14:31:48.701-04:00	5133 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Scaled Deployment 'recommender-app.java' from 2 to 4 replicas

2020-10-04T14:32:40.210-04:00	56642 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Deployment 'recommender-app.java' is still cooling down. Suspending further scaling
2020-10-04T14:33:40.212-04:00	116644 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Deployment 'recommender-app.java' is still cooling down. Suspending further scaling

2020-10-04T14:34:40.138-04:00	176570 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Alarm ARN = arn:aws:cloudwatch:us-east-1:937351930975:alarm:HTTP-Request-Rate-High-Alarm
2020-10-04T14:34:40.138-04:00	176570 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Reason for Trigger = Threshold Crossed: 5 out of the last 5 datapoints were greater than or equal to the threshold (10.0). The most recent datapoints which crossed the threshold: [14.6 (04/10/20 18:33:00), 12.433333333333334 (04/10/20 18:32:00), 14.75 (04/10/20 18:31:00), 14.883333333333333 (04/10/20 18:30:00), 11.2 (04/10/20 18:29:00)] (minimum 2 datapoints for OK -> ALARM transition).
2020-10-04T14:34:40.444-04:00	176876 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Scaled Deployment 'recommender-app.java' from 4 to 6 replicas

2020-10-04T14:35:40.176-04:00	236608 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Deployment 'recommender-app.java' is still cooling down. Suspending further scaling
2020-10-04T14:36:40.333-04:00	296765 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Deployment 'recommender-app.java' is still cooling down. Suspending further scaling

2020-10-04T14:37:40.112-04:00	356544 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Alarm ARN = arn:aws:cloudwatch:us-east-1:937351930975:alarm:HTTP-Request-Rate-High-Alarm
2020-10-04T14:37:40.113-04:00	356544 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Reason for Trigger = Threshold Crossed: 5 out of the last 5 datapoints were greater than or equal to the threshold (10.0). The most recent datapoints which crossed the threshold: [14.116666666666667 (04/10/20 18:36:00), 13.266666666666667 (04/10/20 18:35:00), 14.583333333333334 (04/10/20 18:34:00), 14.6 (04/10/20 18:33:00), 12.433333333333334 (04/10/20 18:32:00)] (minimum 2 datapoints for OK -> ALARM transition).
2020-10-04T14:37:40.295-04:00	356727 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Scaled Deployment 'recommender-app.java' from 6 to 9 replicas

2020-10-04T14:38:40.296-04:00	416728 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Deployment 'recommender-app.java' is still cooling down. Suspending further scaling
2020-10-04T14:39:40.166-04:00	476597 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Deployment 'recommender-app.java' is still cooling down. Suspending further scaling

2020-10-04T14:40:40.119-04:00	536551 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Alarm ARN = arn:aws:cloudwatch:us-east-1:937351930975:alarm:HTTP-Request-Rate-High-Alarm
2020-10-04T14:40:40.119-04:00	536551 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Reason for Trigger = Threshold Crossed: 5 out of the last 5 datapoints were greater than or equal to the threshold (10.0). The most recent datapoints which crossed the threshold: [13.933333333333334 (04/10/20 18:39:00), 12.866666666666667 (04/10/20 18:38:00), 13.366666666666667 (04/10/20 18:37:00), 14.116666666666667 (04/10/20 18:36:00), 13.266666666666667 (04/10/20 18:35:00)] (minimum 2 datapoints for OK -> ALARM transition).
2020-10-04T14:40:40.266-04:00	536698 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Scaled Deployment 'recommender-app.java' from 9 to 10 replicas

The load is now decreased significantly to lower the request rate below the threshold of 5 requests/second. The figure below shows a snapshot of the CloudWatch alarm named HTTP-Request-Rate-Low-Alarm when it breaches the lower threshold.

As the load on the service continues to decrease, a series of scale down events occur as seen from the logs shown below and the number of replicas is brought down to 4, which is the lower bound set by the scaling behavior.

2020-10-04T14:46:11.641-04:00	868072 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Alarm ARN = arn:aws:cloudwatch:us-east-1:937351930975:alarm:HTTP-Request-Rate-Low-Alarm
2020-10-04T14:46:11.641-04:00	868073 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Reason for Trigger = Threshold Crossed: 3 out of the last 5 datapoints [3.45 (04/10/20 18:45:00), 3.8666666666666667 (04/10/20 18:44:00), 3.816666666666667 (04/10/20 18:43:00)] were less than or equal to the threshold (5.0) (minimum 2 datapoints for OK -> ALARM transition).
2020-10-04T14:46:12.025-04:00	868456 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Scaled Deployment 'recommender-app.java' from 10 to 8 replicas

2020-10-04T14:48:20.309-04:00	5042 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Deployment 'recommender-app.java' is still cooling down. Suspending further scaling

2020-10-04T14:49:11.628-04:00	56361 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Alarm ARN = arn:aws:cloudwatch:us-east-1:937351930975:alarm:HTTP-Request-Rate-Low-Alarm
2020-10-04T14:49:11.628-04:00	56361 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Reason for Trigger = Threshold Crossed: 5 out of the last 5 datapoints were less than or equal to the threshold (5.0). The most recent datapoints which crossed the threshold: [3.85 (04/10/20 18:48:00), 3.8 (04/10/20 18:47:00), 3.783333333333333 (04/10/20 18:46:00), 3.816666666666667 (04/10/20 18:45:00), 3.8666666666666667 (04/10/20 18:44:00)] (minimum 2 datapoints for OK -> ALARM transition).
2020-10-04T14:49:11.807-04:00	56540 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Scaled Deployment 'recommender-app.java' from 8 to 6 replicas

2020-10-04T14:50:11.728-04:00	116460 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Deployment 'recommender-app.java' is still cooling down. Suspending further scaling
2020-10-04T14:51:11.871-04:00	176604 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Deployment 'recommender-app.java' is still cooling down. Suspending further scaling

2020-10-04T14:52:11.616-04:00	236349 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Alarm ARN = arn:aws:cloudwatch:us-east-1:937351930975:alarm:HTTP-Request-Rate-Low-Alarm
2020-10-04T14:52:11.616-04:00	236349 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Reason for Trigger = Threshold Crossed: 5 out of the last 5 datapoints were less than or equal to the threshold (5.0). The most recent datapoints which crossed the threshold: [3.85 (04/10/20 18:51:00), 3.85 (04/10/20 18:50:00), 3.7333333333333334 (04/10/20 18:49:00), 3.85 (04/10/20 18:48:00), 3.8 (04/10/20 18:47:00)] (minimum 2 datapoints for OK -> ALARM transition).
2020-10-04T14:52:11.719-04:00	236452 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Scaled Deployment 'recommender-app.java' from 6 to 5 replicas

2020-10-04T14:53:11.879-04:00	296612 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Deployment 'recommender-app.java' is still cooling down. Suspending further scaling
2020-10-04T14:54:11.693-04:00	356425 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Deployment 'recommender-app.java' is still cooling down. Suspending further scaling

2020-10-04T14:55:11.617-04:00	416350 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Alarm ARN = arn:aws:cloudwatch:us-east-1:937351930975:alarm:HTTP-Request-Rate-Low-Alarm
2020-10-04T14:55:11.617-04:00	416350 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Reason for Trigger = Threshold Crossed: 5 out of the last 5 datapoints were less than or equal to the threshold (5.0). The most recent datapoints which crossed the threshold: [3.75 (04/10/20 18:54:00), 3.85 (04/10/20 18:53:00), 3.8666666666666667 (04/10/20 18:52:00), 3.85 (04/10/20 18:51:00), 3.85 (04/10/20 18:50:00)] (minimum 2 datapoints for OK -> ALARM transition).
2020-10-04T14:55:11.833-04:00	416565 [main] INFO com.amazonwebservices.blogs.containers.CloudWatchAlarmHandler - Scaled Deployment 'recommender-app.java' from 5 to 4 replicas

Source code

The complete source code for the custom controller and the Lambda function can be downloaded from the following links:

https://github.com/aws-samples/k8s-cloudwatch-operator
https://github.com/aws-samples/k8s-cloudwatch-operator/cloudwatch-controller
https://github.com/aws-samples/k8s-cloudwatch-operator/cloudwatch-lambda

Concluding remarks

With the recent announcement of CloudWatch Container Insights monitoring for Prometheus, customers can now track custom Prometheus metrics using CloudWatch in addition to resource metrics such as CPU, memory, disk, and network usage from their containerized microservices. For autoscaling their containerized workloads on Amazon EKS or self-managed Kubernetes on AWS, customers have typically relied on the Horizontal Pod Autoscaler to scale the number of Pods based on observed CPU/memory utilization or on some other application-provided metrics with the support of a custom metrics adapter.

This post presented an alternative approach that employs a custom Kubernetes controller in conjunction with Amazon CloudWatch and AWS Lambda to autoscale containerized workloads deployed to an Amazon EKS or self-managed Kubernetes cluster on AWS. It leverages CloudWatch metric alarms that can watch either a single CloudWatch metric or the result of a math expression based on CloudWatch metrics that were generated based on custom Prometheus metrics scraped from containerized microservices. The custom controller enables these AWS resources to be managed with declarative semantics using a Kubernetes custom resource. This approach can also be used for autoscaling based on metrics sent to CloudWatch by other services such as Amazon SQS.

Many customers that are running diverse workloads on AWS including containerized microservices on Amazon EKS, want to use Amazon CloudWatch as a single pane of glass for all their monitoring, alerting and autoscaling needs. The approach outlined in this blog post provides a viable path to address this need.

Containers