Autoscaling Amazon ECS services based on custom metrics with Application Auto Scaling

Introduction

Application Auto Scaling is a web service for developers and system administrators who need a solution for automatically scaling their scalable resources for AWS services such as Amazon Elastic Container Service (Amazon ECS) services, Amazon DynamoDB tables, AWS Lambda Provisioned Concurrency, and more. Application Auto Scaling now offers support for scaling such resources using scaling policies that are based on custom Amazon CloudWatch metrics, which are evaluated against a metric math expression. In this post, we’ll demonstrate how this feature works using a sample scenario, which involves scaling an Amazon ECS service based on the average rate of HTTP requests handled by the service.

Background

Horizontal scalability is a critical aspect of cloud native applications. Application Auto Scaling integrates with several AWS services so that you can add scaling capabilities to meet your application’s demand. It can use one or more of the relevant predefined metrics available in Amazon CloudWatch in conjunction with either target tracking or step scaling policies to proportionally scale the resources in a given service. There are several use cases where such pre-defined metrics alone are not reliable indicators of when to execute a scaling action and by how much. In certain scenarios, custom metrics that track other application aspects such as number of HTTP requests received, number of messages retrieved from a queue/topic, number of database transactions executed, etc. may be better suited to trigger scaling actions.

An important consideration when using target tracking policy with Application Auto Scaling is that the specified metric should represent an average utilization, which describes how busy a scalable target is. Metrics such as, the number of HTTP requests received by an application, the number messages retrieved from a queue/topic in a message broker, don’t meet this requirement because they are cumulative in nature and hence monotone increasing. They’ll have to be converted into a utilization metric, which increases or decreases proportionally to the capacity of the scalable target. Previously, customers needed to write proprietary code that performed this conversion. A representative example of such a use case is presented in this post, which discusses the details of scaling the number of tasks in an Amazon ECS service based on the rate of messages published to a topic in Amazon Managed Streaming for Apache Kafka (Amazon MSK). It employs a AWS Lambda function that periodically fetches custom metric data from Amazon CloudWatch, computes an average utilization metric and publishes the latter as new custom metric in Amazon CloudWatch. Subsequently, a target tracking policy is defined using a custom metric specification that references this new, pre-computed utilization metric.

The new feature enables customers to use metric math expressions inline within their custom metric specification in order to define how such an average utilization metric is computed using one or more Amazon CloudWatch metrics. Everything else is taken care of by Application Auto Scaling. It eliminates the need for customers to write code that performs this task as well as deploy compute infrastructure to run this code. These additional tasks merely added operational overhead and maintenance cost but did not add any differentiation to their applications. Let’s dive into the details of how the new feature works.

Solution overview

We’ll demonstrate how the new feature works using a sample workload deployed to an Amazon ECS cluster. The following illustration (Figure 1) shows the setup employed for this demonstration. It is representative of a microservices architecture where a backend datastore service, which communicates directly with an instance of Amazon Aurora PostgreSQL database, exposes a set of REST APIs that allows other microservices to perform Create, Read, Update, and Delete (CRUD) operations the database.

Sample workload used to demonstrate autoscaling

The application has been instrumented with Prometheus client library and uses a Prometheus Counter named http_requests_total to keep track of the number of HTTP requests sent to the service. To collect Prometheus metrics from Amazon ECS clusters, you can use CloudWatch agent with Prometheus monitoring or use the AWS Distro for OpenTelemetry collector. In this demonstration, we use the CloudWatch agent. The custom metric is published by the agent to the CloudWatch namespace ECS/ContainerInsights/Prometheus. The goal is to auto scale the datastore service in proportion to the average rate of HTTP requests processed by the running tasks. Note that as the backend service is not registered with a load balancer, metrics published by Elastic Load Balancing to CloudWatch cannot be used as reliable indicators of the load on the service.

Walkthrough

The first step in the setup is to register the scalable dimension of the target resource based on which the scaling actions are executed. In this case, it’s the DesiredCount of the tasks in the service.

CLUSTER_NAME=ecs-ec2-cluster
SERVICE_NAME=BackendService
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--scalable-dimension ecs:service:DesiredCount \
--resource-id service/$CLUSTER_NAME/$SERVICE_NAME \
--min-capacity 2 \
--max-capacity 10

Next, we’ll setup a target tracking policy with Application Auto Scaling API. The custom metric to be used with a target tracking policy is defined using the CustomizedMetricSpecification field in a policy configuration JSON file as shown in the following code.

{
    "TargetValue":5.0,
    "ScaleOutCooldown":120,
    "ScaleInCooldown":120,
    "CustomizedMetricSpecification":{
        "MetricName":"http_request_rate_average_1m",
        "Namespace":"ECS/CloudWatch/Custom",
        "Dimensions":[
           {
              "Name":"ClusterName",
              "Value":"ecs-ec2-cluster"
           },
           {
              "Name":"TaskGroup",
              "Value":"service:BackendService"
             }
        ],
        "Statistic":"Average"
     }
 }

Previously, the requirement was that the custom metric specified in the MetricName field of the JSON file should be a utilization metric that was pre-computed and readily available in CloudWatch. With the new feature, the schema of the CustomizedMetricSpecification field has been extended to support the inclusion of a metric math expression using the Metrics field. Metric math enables you to query multiple CloudWatch metrics and use several functions and operators to create new time series based on these metrics. This allows customers to dynamically create a custom utilization metric by merely specifying a math expression in the policy configuration JSON file as the one shown in the example below. Note that this schema is exactly the same as the one used by the MetricDataQuery parameter for the GetMetricData CloudWatch API with one exception, namely, the Period field. Application Auto Scaling sets this under the hood to a pre-defined value of 60 seconds and it can’t be explicitly defined in the custom metric specification. In the following example, the math expression computes a metric named http_request_rate_average_1m based on two other CloudWatch metrics. The first one, http_requests_total, is the custom Prometheus metric we discussed above. The second metric, RunningTaskCount, is automatically collected by Container Insights for Amazon ECS and published in the namespace ECS/ContainerInsights. You must ensure that you have enabled Container Insights for your Amazon ECS clusters as documented here. The computed metric is the average per-second rate of HTTP requests processed by the running tasks over a trailing 1-minute period. This utilization metric is now used to auto scale the service by setting a target threshold at 5 requests per seconds using the TargetValue field.

{
   "TargetValue":5.0,
   "ScaleOutCooldown":120,
   "ScaleInCooldown":120,
   "CustomizedMetricSpecification":{
      "Metrics":[
         {
            "Id":"m1",
            "Label":"sum_http_requests_total_1m",
            "ReturnData":false,
            "MetricStat":{
               "Metric":{
                  "Namespace":"ECS/ContainerInsights/Prometheus",
                  "MetricName":"http_requests_total",
                  "Dimensions":[
                     {
                        "Name":"ClusterName",
                        "Value":"ecs-ec2-cluster"
                     },
                     {
                        "Name":"TaskGroup",
                        "Value":"service:BackendService"
                     }
                  ]
               },
               "Stat":"Sum"
            }
         },
         {
            "Id":"m2",
            "Label":"running_task_count_average_1m",
            "ReturnData":false,
            "MetricStat":{
               "Metric":{
                  "Namespace":"ECS/ContainerInsights",
                  "MetricName":"RunningTaskCount",
                  "Dimensions":[
                     {
                        "Name":"ClusterName",
                        "Value":"ecs-ec2-cluster"
                     },
                     {
                        "Name":"ServiceName",
                        "Value":"BackendService"
                     }
                  ]
               },
               "Stat":"Average"
            }
         },
         {
            "Id":"m3",
            "Expression":"m1/(m2*60)",
            "Label":"http_request_rate_average_1m",
            "ReturnData":true
         }
      ]
   }
}

A target tracking scaling policy is created using above policy configuration JSON file with the CLI command shown below. When we create a target tracking scaling policy, Application Auto Scaling creates the two metric alarms (Figure 2): one to trigger a scale-out action (high-alarm) and the other to trigger scale-in action (low-alarm).

CLUSTER_NAME=ecs-ec2-cluster
SERVICE_NAME=BackendService
POLICY_NAME=HTTP-Request-Rate-Policy
aws application-autoscaling put-scaling-policy \
--policy-name $POLICY_NAME \
--service-namespace ecs \
--resource-id service/$CLUSTER_NAME/$SERVICE_NAME \
--scalable-dimension ecs:service:DesiredCount \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration file://policy.json

Figure 2. High and low alarm created by Application Auto Scaling to handle scale-out and scale-in

Autoscaling in action with Amazon ECS

A steady stream of requests is sent to this workload using Locust as the load generator. Given that this backend service isn’t registered with a load balancer, to facilitate testing, the service is registered in a service registry in AWS Cloud Map. This allows the load generator to use the service DNS hostname to direct traffic to the underlying tasks. The load profile for the duration of testing is shown below (Figure 3). We start off with an initial load of about 9 requests /second. The load is double after a period to trigger a scale-out action and then decreased to about 4 requests/second to trigger an eventual scale-in action.

Profile of HTTP requests load generated using Locust load generation tool

Figure 4 below shows the profile of the target utilization metric http_request_rate_average_1m based on which the autoscaling actions are performed. Figure 5 shows the number of running tasks in the service before scale-out, after scale-out, and after scale-in. Starting with two running tasks, the initial load on the service keeps the target metric at around 4 requests/second/task, which is well under the configured upper threshold of 5. When the load is doubled, the metric breaches this upper threshold and after about 3 minutes (evaluation periods for high-alarm) the scale-out action occurs, triggered by the high-alarm. Target tracking leads to a proportional increase in the number of tasks to 4 so that the target metric is brought under the upper threshold. Subsequently, when the load is decreased to 4 requests/second, with 4 tasks running, the target metric drops down to about 1 request/second/task and breaches the lower threshold. After about 15 minutes have elapsed (i.e., the evaluation period for low-alarm), the scale-in action occurs, which is triggered by the low-alarm and bringing the number of tasks back down to 2.

Approximate timelines of scale-out and scale-in actions triggered by the metric alarms

Conclusion

In this post, we showed you how to auto scale an Amazon ECS service using metric math support for target tracking in Application Auto Scaling. This feature is now generally available and can be used via the AWS CLI or AWS SDK. Previously, when customers needed to auto scale resources in certain AWS services using Application Auto Scaling, they were limited to using a set of predefined metrics available in CloudWatch. To use other custom application metrics, customers had to write proprietary code that performed the task of pre-computing a utilization metric and making it available in Amazon CloudWatch. The new feature eliminates the need to maintain such code doesn’t add any differentiation to the target application. It greatly simplifies the task of autoscaling scalable resources in AWS services such as Amazon ECS services, AWS Lambda, and more, based on custom metrics. We greatly value our customers feedback at Amazon, so please let us know how this new feature is working for you.

Containers