Faster Scaling-in for Amazon ECS Cluster Auto Scaling


Amazon Elastic Container Service (ECS) customers who use Cluster auto scaling (CAS) have expressed that they would like to scale-in more quickly so that they can avoid paying extra charges for compute resources during scale-in events. To make scaling-in more responsive, today we are pleased to introduce an enhancement to increase the scale-in step limit from 5% to 50% of an Auto Scaling group’s (ASG’s) capacity. Our analysis informed us that increasing step limit to 50% was optimal for improving overall cost efficiencies while not discounting availability. In this blog, we will learn more about this improvement and use an example to demonstrate the scale-in time and cost improvements that have resulted from this enhancement.


Amazon ECS is a fully managed container orchestration service that makes it easier for you to deploy, manage, and scale containerized applications. Cluster auto scaling is a feature of Amazon ECS that manages the scaling of Amazon Elastic Compute Cloud (EC2) ASGs registered to the cluster. An ECS capacity provider is the compute interface that links your Amazon ECS cluster with your ASG.

With capacity providers, you can define flexible rules for how containerized workloads run on different types of compute capacity and manage the scaling of that capacity. CAS relies on capacity providers to determine the infrastructure to use for your tasks, allowing you to shift your focus from managing infrastructure scaling to building and deploying your application. Capacity providers manages the scale-in and scale-out actions of the ASG based on the load your tasks put on your cluster infrastructure.

Customers tell us that they want to optimize costs as well as availability of infrastructure for their applications—especially in cases when short-lived traffic spikes can lead to scaling out of capacity, which may remain underutilized over a long period. This happens because CAS is a latent process that performs scaling actions over several steps, including collection and aggregation of an Amazon CloudWatch metrics. You can learn more about how CAS works here.

Specifically, to determine a scale-in action, CAS requires an initial 15 minutes’ worth of data points from CloudWatch alarms. Thereafter, Amazon ECS reduces capacity by a predetermined scale-in step percent over one or more scale-in events. The scale-in steps ensure that capacity reduces gradually over time, and spare capacity is available to respond to traffic spikes—but may also result in potentially higher compute charges. These charges may be insignificant for smaller-scale workloads, but for large-scale workloads this cost may be undesirable.

Demonstrating scale-in time and cost improvements

We tested scale-in step limit improvement from 5% to 50% with an Amazon ECS service having 1,000 tasks requiring 334 c5.xlarge instances at peak traffic down to 100 tasks requiring 34 instances. We observed an 8x improvement in scale-in time—from 80 minutes to 10 minutes—and a reduction in compute cost by over 16x—from $7.47 to $0.46—during the scale-in duration. As a result, applications will be more responsive to capacity demand as ECS provisioned EC2 instances will be released more quickly.

This enhanced scale-in mechanism aligns directly to the Cost Optimization Pillar of the AWS Well Architected Framework principle by adopting a consumption model, paying only for what you use. This enhancement also aligns with the Reliability Pillar of the AWS Well Architected Framework principle, monitoring demand and workload utilization to ensure resources are not overprovisioned. You should observe an overall improved scaling-in performance while using CAS, as well as reduce the overall compute costs associated with it, leading to a higher return on investment.

Understanding the scale-in step enhancement

Let’s look at a scenario that will illustrate the value and benefit that the customers will get from this scale-in step limit improvement.

Imagine that you’re running an Amazon ECS service that serves requests for a website and during the day serves a peak load requiring 1,000 tasks. The service is deployed using an ECS capacity provider backed by an ASG, which means the tasks for the service are running on Amazon EC2 instances. The service uses a binpack placement strategy, where the tasks are placed on Amazon EC2 instances to leave the least amount of unused CPU or memory, using the least number of Amazon EC2 instances. Because the requests to this website decrease at the end of the day, the service is now going to require only 100 tasks. The scaling down of the service means that the CAS will determine that a lower number of Amazon EC2 instances is required to run the tasks. We will visualize the scale-in of the Amazon EC2 instances in the ASG on Amazon CloudWatch, which monitors AWS resources and the applications by collecting metrics on AWS in real time.

For this experiment, we create a service starting from a BusyBox base image from the public repositories of Amazon Elastic Container Repository (Amazon ECR). The service is running on an ECS capacity provider backed by ASG running on-demand c5.xlarge Amazon EC2 instances, with a minimum size of 0 and maximum size of 1000. The metrics on the ASG are enabled so that we can track the number of Amazon EC2 instances in service (GroupInServiceInstances). We will set up dashboards in Amazon CloudWatch to view the ASG metrics.

The results of the experiment are in graph 2, where we see the ASG start to scale down from 334 instances to 34 instances in 10 minutes (30 instances per minute). In tests done before the scale-in step limits, ASG scales down from 334 to 34 instances, taking 80 minutes (3.75 instances per minute), as shown in graph 1. The rate of scale-in from 3.75 instances per minute to 30 instances per minute after implementing scale-in step limits represents an 8x improvement in scale down rates. Please note that it takes 15 minutes for ASG scale-in to initiate after the tasks have completed scaling-in for both cases. This is because CloudWatch scale-in alarms require 15 data points (1 data point per minute) before the scale-in process for the Auto Scaling group starts. Here, we are only covering the improvement in the scale-in duration when ASG scale-in has already kicked in.

Graph 1. This graph represents the scale-in of Amazon EC2 instances prior to the scale-in step limit improvement. The number of Amazon EC2 instances scaled down by ASG is represented by the GroupInServiceInstance line (green line).

Graph 2. This graph represents the scale-in of Amazon EC2 instances after the scale-in step limit improvement. The number of Amazon EC2 instances scaled down by ASG is represented by the GroupInServiceInstance line (green line).

To analyze the impact of the scale-in on overall cost, we tracked the cost of running Amazon EC2 instances using the hourly report in AWS Cost Explorer. With the 50% scale-in step improvement, as shown in table 2, the scale-in event only lasted for 10 mins, and the cost of running Amazon EC2 instance over the whole hour was $6.24. Before the scale-in step improvement, the cost of EC2 over the 2-hour time period from 12 a.m. to 2 a.m., encompassing the 80 mins of scale-in event, was $13.22 and $5.81, as shown in Table 1.

We then calculated the cost of Amazon EC2 instances during scale-in period by starting with the cost of all EC2 instances running during the scale-in period, and then subtracting the cost of 34 c5.xlarge instances over the same time period. The difference gives us the cost of excess compute resources over the desired count of 34 instances required for steady state of 100 tasks. Note that the cost of running 34 c5.xlarge Amazon EC2 instances in both scenarios was $5.78 per hour. With the 50% scale-in step limit improvement, the scale-in cost of Amazon EC2 instances came out to be $0.46 (i.e. $6.24-$5.78). Before the scale-in step limit improvement, the scale-in cost of Amazon EC2 instances came out to be $7.47 (i.e. $13.22-$5.78 + $5.81-$5.78).

The overall cost of compute resources during scale-in period reduced by over 16x times from $7.47 to $0.46. Please note that the cost results may vary since costs are influenced by factors such as the Amazon EC2 instance types set up in the ASG, and on the region/availability zones where the Amazon EC2 instance are running in.

Table 1. This table represents the hourly cost of running Amazon EC2 instances as viewed in AWS Cost Explorer prior to the scale-in step limit improvement.

Table 2. This table represents the hourly cost of running Amazon EC2 instances as viewed in AWS Cost Explorer after the scale-in step limit improvement.


AWS is invested in making scaling and performance improvements to optimize the scaling speed of Amazon ECS. Some of the recent improvements we’ve delivered include optimizations in AWS Fargate—scaling applications up to 16x faster, and optimizations in capacity provider—delivering a faster CAS experience. To further simplify the scaling experience, starting May 27, 2022, Amazon ECS no longer requires an AWS auto scaling plan when managed scaling is enabled on an ASG capacity provider.

In this article, we introduced enhancements to the scale-in step limit in Amazon ECS CAS and showed how it helped improve the overall scaling in performance and reduced the compute cost associated with large scale-in activities. These optimizations are now automatically enabled for you in AWS Regions where ECS is available. No additional action is required from your end.

We hope you enjoy this improvement. You can get started with Amazon ECS today.

Sudhi Bhat

Sudhi Bhat

Sudhi Bhat is a Sr. Specialist Solution Architect at Amazon Web Services based in Austin, TX. He is a software technology leader with 15+ years of experience in building large-scale, distributed software systems. He helps customers achieve their strategic business objectives by providing prescriptive guidance for building solutions on AWS. His current interests are in the areas of Compute, Containers and Security.

Abhishek Nautiyal

Abhishek Nautiyal

Abhishek Nautiyal is a Senior Product Manager-Technical on the Amazon ECS team. His areas of focus include compute capacity management, task scheduling, and performance and scalability.

Kevin O'Connor

Kevin O'Connor

Kevin O'Connor is a Principal Technical Product Manager on the EC2 team based out of Boston, MA. His areas of focus are compute infrastructure, compute lifecycle management, and scaling compute workloads on EC2. His interests include software engineering, infrastructure, artificial intelligence (AI), and machine learning (ML).

Ananth Raghavendra

Ananth Raghavendra

Ananth Raghavendra is a Senior Solution Architect at Amazon Web Services (AWS) with a focus on EC2 Spot, Graviton2, and EC2 related services such as Auto Scaling Group. Ananth has been passionate about helping customers with their cloud journey solving for their performance problems with a cost efficiency. Ananth has more than 25 years of experience in development, architecture, DevOps, and infrastructure.