Containers
Autoscaling Amazon ECS services based on custom CloudWatch and Prometheus metrics
Introduction
Horizontal scalability is a critical aspect of cloud native applications. Microservices deployed to Amazon ECS leverage the Application Auto Scaling service to automatically scale based on observed metrics data. Amazon ECS measures service utilization based on CPU and memory resources consumed by the tasks that belong to a service and publishes CloudWatch metrics, namely, ECSServiceAverageCPUUtilization and ECSServiceAverageMemoryUtilization, with this data. Application Auto Scaling can then use these predefined metrics in conjunction with scaling policies to proportionally scale the number of tasks in a service. There are several use cases where a service’s average CPU and memory usage alone are not reliable indicators of when and to what degree to execute a scaling action. Custom metrics that track other application aspects such as number of HTTP requests received, number of messages retrieved from a queue/topic, and number of database transactions executed, may, in some scenarios, be better suited to trigger scaling actions.
Application Auto Scaling also supports scaling a service based on a custom metric specification that represents a CloudWatch metric of our choosing. Customers have had the ability to publish custom metric data from their applications to CloudWatch using one of the AWS SDKs best suited to their programming language or platform. With the recent announcement of the general availability of Container Insights Prometheus Metrics Monitoring for Amazon ECS, customers can now automate the discovery and collection of custom Prometheus metrics from their containerized applications.
In this post, we will discuss the details of how either one of these metrics collection strategies can be used in conjunction with Application Auto Scaling to scale services deployed to Amazon ECS based on custom metrics data.
Architecture
The autoscaling solution is demonstrated using the application stack shown in the figure below. Streaming data is received via an Application Load Balancer by the service, Kafka Producer Service, deployed to an Amazon ECS cluster. The service publishes the data to a topic in Amazon MSK. Another service, Kafka Consumer Service, retrieves these messages from the Kafka topic and performs downstream processing. The services use AWS SDK to publish custom metric data into a CloudWatch namespace. The services have also been instrumented with Prometheus client library. The CloudWatch agent with Prometheus support is deployed as a service in the Amazon ECS cluster and is configured to collect Prometheus metrics from these services and send them to CloudWatch. An AWS Lambda function that is scheduled to run periodically converts the custom metric data collected from the applications into a service utilization metric that is then used by Application Auto Scaling to proportionally scale the target.
The objective is to scale out/in the number of tasks in the Kafka Consumer Service so that
- The rate of messages processed by all the tasks that belong to that the service stays abreast of the rate of messages produced by the Kafka Producer Service
- The average rate of messages processed by each task in the Kafka Consumer Service stays below a set threshold.
Publishing custom metrics with the CloudWatch SDK
Applications can publish custom metric data to CloudWatch namespace as long as it doesn’t begin with “AWS/”. The producer and consumer applications have been instrumented with AWS SDK for Java as shown below so that they can capture metric data pertaining to the number of messages produced/consumed.
A counter is used to keep track of the number of messages sent to/received from the Kafka topic by each task. The incMessagesPublished/incMessagesConsumed methods, invoked from elsewhere in the application code, increment the counter every time a message is handled by a task. The putMetricData method is periodically called once every 30 seconds to publish this metric data to the CloudWatch namespace ECS/CloudWatch/Custom. Refer to the documentation for Amazon CloudWatch pricing for charges incurred on custom metrics.
Publishing custom metrics with Prometheus
If you choose to capture custom metrics using Prometheus, then the applications are instrumented with Prometheus client library for Java as shown below.
Here, a Prometheus Counter is used to keep track of the number of messages sent to/received from the Kafka topic by each task. The Counter is incremented by calling the incMessagesPublished/incMessagesConsumed methods every time a message is handled by a task. In this implementation, both producer and consumer applications make use of the Vert.x framework and expose these custom metrics to a metrics collector at the /metrics endpoint using the MetricsHandler.
Steps to deploy the CloudWatch agent with Prometheus monitoring to Amazon ECS, and how to configure it to scrape targets are documented here. While Prometheus supports auto-discovery of scraping targets in a Kubernetes cluster such as nodes, services, pods, there is no such built-in discovery for Amazon ECS. The mechanism used by the CloudWatch agent for identifying scraping targets in an Amazon ECS cluster leverages Prometheus’ support for file-based service discovery and is well documented here. The agent configuration used for the current implementation is shown below.
The ecs_service_discovery section helps the CloudWatch agent identify the set of tasks deployed using the KafkaConsumerTask and KafkaProducerTask task definitions as scraping targets. The emf_processor.metric_declaration section configures how Prometheus metrics scraped from these tasks are converted into performance log events using the embedded metric format. With the above configuration settings, a performance log event sent to CloudWatch logs by the agent looks as shown below. This log event will be used by CloudWatch to generate data for a custom metric named messages_consumed_total in the CloudWatch namespace ECS/ContainerInsights/Prometheus with the dimensions, ClusterName and TaskGroup. These dimensions will enable the aggregation of metrics data collected from all tasks that belong to a service.
Setting up service scaling with Application Auto Scaling
Amazon ECS service is autoscaled using Application Auto Scaling with a target tracking policy that selects a CloudWatch metric and sets a target value. A custom metric specification that represents a CloudWatch metric of our choosing can be used to set up such a policy. However, not all metrics are suitable target tracking. Cumulative metrics such as messages_consumed_total and messages_produced_total, which are monotone increasing, are not particularly useful for autoscaling. We will have to convert them into a utilization or rate metric that increases or decreases proportionally to the capacity of the scalable target, in this case, the number of tasks in a service.
This task is performed by an AWS Lambda function in this implementation. The Go program shown below is deployed as a Lambda function and using Amazon EventBridge, it is scheduled to execute once every minute. This function is configured to retrieve data for a trailing 60 second period from the CloudWatch namespace ECS/CloudWatch/Custom for the metric messages_produced_total and, using metric math expression, compute a new metric named rate_messages_produced_average_1m proportional to the number of consumer tasks. This metric is then published back into CloudWatch as a custom metric and serves as a valid utilization metric that can be used by Application Auto Scaling.
The Lambda function performs its task based on the following JSON configuration data, which is read from AWS Systems Manager Parameter Store. The execution role for this function requires the following set of IAM permissions: cloudwatch:GetMetricData, cloudwatch:PutMetricData, ecs:DescribeServices, ssm:GetParameter, logs:
Next, the DesiredCount dimension for the consumer service is registered as a scalable target with Application Auto Scaling. This will allow the service to scale out up to a maximum of 10 tasks, which in this implementation, is also the number of partitions in the Kafka topic.
Next, a target tracking policy is created using the configuration parameters defined in config.json.
The contents of config.json file are shown below. This policy tracks the custom metric rate_messages_produced_average_1m with target value of 30.
With the above policy, Application Auto Scaling creates and manages CloudWatch metric alarms that trigger the scaling actions. The steps involved in converting a CloudWatch custom metric that was created based on a Prometheus Counter into a utilization or rate metric are essentially the same. The relevant metric data is retrieved from the namespace ECS/ContainerInsights/Prometheus.
Autoscaling in action
We start off with a deployment that is comprised of two tasks of the Kafka Publisher Service, one task of the Kafka Consumer Service, as well as the CloudWatch agent with Prometheus support. The Kafka topic is set up with 10 partitions. A steady stream of messages is published to the Kafka topic at the rate of about 65-70 messages/second. Messages from all partitions are processed by the single task in the consumer service at the rate of about 20 messages/second. The per-minute rates of messages produced/consumed are calculated using the metric math expressions SUM(rate_messages_produced_total)/60 and SUM(rate_messages_consumed_total)/60 respectively and displayed in Figure 2. Figure 3 shows the custom utilization metric rate_messages_produced_average_1m, which is computed by Lambda and used for target tracking by Application Auto Scaling. This denotes the rate of messages produced in proportion to the number of consumer tasks is initially also at about 65-70 messages/second as we have only one consumer task.

Figure 1. List of services deployed to ECS

Figure 2. Initial rate messages produced/consumed shown using metric math expressions

Figure 3. Custom utilization metric computed by Lambda and used for target tracking by Application Auto Scaling
When we create a target tracking scaling policy, Application Auto Scaling creates the two metric alarms shown in the figure below to manage the automatic scaling of the consumer tasks. These two metric alarms, one to handle scale out and the other to handle scale in, together ensure that the average value of the utilization metric rate_messages_produced_average_1m stays in the range of 27-30.

Figure 4. CloudWatch metric alarms managed by Application Auto Scaling
As the initial value of the metric rate_messages_produced_average_1m is about 65-70, the alarm that tracks the upper threshold of this metric executes the first scale out action, during which two additional consumer tasks are launched, which brings this utilization metric value below the target of 30. The rate of messages produced is then doubled to about 130-140 messages/rate. Subsequently, the second scaling action is executed, which launches two more tasks to keep this metric value below the target of 30 after it breaches the set threshold. With more consumer tasks launched, we can see that the combined rate of messages processed by the consumer tasks, indicated by rate_messages_consumed_total_1m, picks up and gets closer to that of the producers. The figures below shows how the system behaves in response to the two scale out actions.

Figure 5. Behavior of the target metric, rate_messages_produced_average_1m, during the two scale out actions.

Figure 6. Behavior of the per-minute rate of messages produced/consumed (blue/red) during the scale out actions
The rate of messages produced is now decreased to about 50 messages/second resulting in a scale in action shown in the figures below, bringing the number of consumer tasks down to two. The primary goal of Application Auto Scaling is to prioritize availability in response to an increased load on the system. As seen from these figures, it performs a scale out relatively faster, when a metric breaches a threshold for three one-minute periods, while the scale in action is executed in a far more conservative manner, only after the threshold is breached for 15 one-minute periods.

Figure 7. Behavior of the target metric, rate_messages_produced_average_1m, during the scale in action.

Figure 8. Behavior of the per-minute rate of messages produced/consumed (blue/red) during the scale in action
The figure below shows the change in the number of tasks in the Kafka Consumer Service as well as its average CPU and memory profile during the autoscaling event. Before the scale out, there is one task that is consuming messages from all 10 partitions and after the scale out each of the five tasks is processing messages from two partitions. But the nature of the application in this use case is such that the average CPU and memory usage in each task do not have any correlation to the number of partitions it is consuming messages from and hence are not reliable indicators of when to perform automatic scaling.

Figure 9. Average CPU and memory usage by consumer tasks
The procedure used for implementing autoscaling with a custom Prometheus metric that was collected from an Amazon ECS service by Container Insights is exactly the same as above. These metrics are ultimately also reported as CloudWatch custom metrics similar to the ones published using CloudWatch SDKs.
Concluding remarks
There are many use cases where custom metrics collected from an application are better indicators for executing an autoscaling action than the metrics providing CPU and memory usage by the application. Microservices deployed to Amazon ECS have the option of collecting such custom application metrics using either CloudWatch SDKs or CloudWatch Container Insights monitoring for Prometheus. A counter, which is a monotone increasing value, is often used to capture cumulative metrics such as the number of HTTP requests processed, number of messages processed from a queue, and number of database transactions executed. This post presented an approach for using Application Auto Scaling in conjunction with such custom cumulative metrics to effectively autoscale microservices deployed to an Amazon ECS cluster.