AWS Compute Blog

Automatic Scaling with Amazon ECS

My colleague Mayank Thakkar sent a nice guest post that describes how to scale Amazon ECS clusters and services.

You’ve always had the option to scale clusters automatically with Amazon EC2 Container Service (Amazon ECS). Now, with the new Service Auto Scaling feature and Amazon CloudWatch alarms, you can use scaling policies to scale ECS services. With Service Auto Scaling, you can achieve high availability by scaling up when demand is high, and optimize costs by scaling down your service and the cluster, when demand is lower, all automatically and in real-time.

This post shows how you can use this new feature, along with automatic cluster resizing to match demand.

Service Auto Scaling overview

Out-of-the-box scaling for ECS services has been a top request and today we are pleased to announce this feature. The process to create services that scale automatically has been made very easy, and is supported by the ECS console, CLI, and SDK. You choose the desired, minimum and maximum number of tasks, create one or more scaling policies, and Service Auto Scaling handles the rest. The service scheduler is also Availability Zone–aware, so you don’t have to worry about distributing your ECS tasks across multiple zones.

In addition to the above, ECS also makes it very easy to run your ECS tasks on a multi-AZ cluster. The Auto Scaling group for the ECS cluster manages the availability of the cluster across multiple zones to give you the resiliency and dependability that you are looking for, and ECS manages the task distribution across these zones, allowing you to focus on your business logic.

The benefits include:

  1. Match deployed capacity to the incoming application load, using scaling policies for both the ECS service and the Auto Scaling group in which the ECS cluster runs. Scaling up cluster instances and service tasks when needed and safely scaling them down when demand subsides, keeps you out of the capacity guessing game. This provides you high availability with lowered costs in the long run.
  2. Multi-AZ clusters make your ECS infrastructure highly available, keeping it safeguarded from potential zone failure. The Availability Zone–aware ECS scheduler manages, scales, and distributes the tasks across the cluster, thus making your architecture highly available.

Service Auto Scaling Walkthrough

This post walks you through the process of using these features and creating a truly scalable, highly available, microservices architecture. To achieve these goals, we show how to:

  1. Spin up an ECS cluster, within an Auto Scaling group, spanning 2 (or more) zones.
  2. Set up an ECS service over the cluster and define the desired number of tasks.
  3. Configure an Elastic Load Balancing load balancer in front of the ECS service. This serves as an entry point for the workload.
  4. Set up CloudWatch alarms to scale in and scale out the ECS service.
  5. Set up CloudWatch alarms to scale in and scale out the ECS cluster. (Note that these alarms are separate from the ones created in the previous step.)
  6. Create scaling policies for the ECS service, defining scaling actions while scaling out and scaling in.
  7. Create scaling policies for the Auto Scaling group in which the ECS cluster is running. These policies are used to scale in and scale out the ECS cluster.
  8. Test the highly available, scalable ECS service, along with the scalable cluster by gradually increasing the load and followed by decreasing the load.

In this post, we walk you through setting up one ECS service on the cluster. However, this pattern can also be applied to multiple ECS services running on the same cluster.

Please note: You are responsible for any AWS costs incurred as a result of running this example.

Conceptual diagram

Set up Service Auto Scaling with ECS

Before you set up the scaling, you should have an ECS service running on a multi-AZ (2 zone) cluster, fronted by a load balancer.

Set up CloudWatch alarms

  1. In the Amazon CloudWatch console, set up a CloudWatch alarm, to be used during scale in and scale out of the ECS service. This walkthrough uses CPUUtilization (from the ECS, ClusterName, ServiceName category), but you can use other metrics if you wish. (Note: Alternatively, you can set up these alarms in the ECS Console when configuring scaling policies for your service.)
  2. Name the alarm ECSServiceScaleOutAlarm and set the threshold for CPUUtilization to 75.
  3. Under the Actions section, delete the notifications. For this walkthrough, you’ll configure an action through the ECS and Auto Scaling consoles.
  4. Repeat the two steps above to create the scale in alarm, setting the CPUUtilization threshold to 25 and the operator to ‘<='”.
  5. In the Alarms section, you should see your scale in alarm in the ALARM state. This is expected, as there is currently no load on the ECS service.
  6. Follow the same actions as in the previous step to set up CloudWatch alarms on the ECS cluster. This time, use CPUReservation as a metric (from ECS, ClusterName). Create 2 alarms, as in the previous step, one to scale out the ECS cluster and other to scale in. Name them ECSClusterScaleOutAlarm and ECSClusterScaleInAlarm (or whatever name you like).

Note: This is a cluster specific metric (as opposed to a cluster_service specific metric), which makes the pattern useful even in multiple ECS service scenarios. The ECS cluster is always scaled according to the load on the cluster, irrespective of where it originates.

Because scaling ECS services is much faster than scaling an ECS cluster, we recommend keeping the ECS cluster scaling alarm more responsive than the ECS service alarm. This ensures that you always have extra cluster capacity available during scaling events, to accommodate instantaneous peak loads. Keep in mind that running this extra EC2 capacity increases your cost, so find the balance between reserve cluster capacity and cost, which will vary from application to application.

Add scaling policies on the ECS service

Add a scale out and a scale in policy on the ECS service created earlier.

  1. Sign in to the ECS console, choose the cluster that your service is running on, choose Services, and select the service.
  2. On the service page, choose Auto Scaling, Update.
  3. Make sure the Number of Tasks is set to 2. This is the default number of tasks that your service will be running.
  4. On the Update Service page, under Optional configurations, choose Configure Service Auto Scaling.
  5. On the Service Auto Scaling (optional) page, under Scaling, choose Configure Service Auto Scaling to adjust your service’s desired count. For both Minimum number of tasks and Desired number of tasks, enter 2. For Maximum number of tasks, enter 10. Because you mapped port 80 of the host (EC2 instance) to port 80 of the ECS container when you created the ECS service, make sure that you set the same numbers for both the Auto Scaling group and the ECS tasks.
  6. Under the Automatic task scaling policies section, choose Add Scaling Policy.
  7. On the Add Policy page, enter a value for Policy Name. For Execute policy when, enter the scale out CloudWatch alarm created earlier (ECSServiceScaleOutAlarm). For Take the action, choose Add 100 percent. Choose Save.
  8. Repeat the two steps above to create the scale in policy, using the scale in CloudWatch alarm created earlier (ECSServiceScaleInAlarm). For Take the action, choose Remove 50 percent. Choose Save.
  9. On the Service Auto Scaling (optional) page, choose Save.

Add scaling policies on the ECS cluster

Add a scale out and a scale in policy on the ECS cluster (Auto Scaling group).

  1. Sign in to the Auto Scaling console and select the Auto Scaling Group which was created for this walkthrough.
  2. Choose Details, Edit.
  3. Make sure the Desired and Min are set to 2, and Max is set to 10. Choose Save.
  4. Choose Scaling Policies, Add Policy.
  5. First, create the scale out policy. Enter a value for Name. For Execute policy when, choose the scale out alarm (ECSClusterScaleOutAlarm) created earlier. For Take the action, choose Add 100 percent of group and then choose Create.
  6. Repeat the above step to add the scale in policy, using the scale in alarm (ECSClusterScaleInAlarm) and setting Take the action as Remove 50 percent of group.

You should be able to see the scale in and scale out polices for your Auto Scaling group. Using these policies, the Auto Scaling group can increase or decrease the size of the cluster on which the ECS service is running.

Note: You may set the cluster scaling policies in such a way so that you can have some additional cluster capacity in reserve. This will help your ECS service scale up faster, but at the same time, depending on your demand, keep some EC2 instances underutilized.

This completes the Auto Scaling configuration of the ECS service and the Auto Scaling group, which in this case, will be triggered from the different CloudWatch alarms. You can always use a different combination of CloudWatch alarms to drive each of these policies for more sophisticated scaling policies.

Now that you have the service running on a cluster that has capacity to scale out on, send traffic to the load balancer that should trigger the alarm.

Load test the ECS service scaling

Now, load test the ECS service using the Apache ab utility and make sure that the scaling configuration is working (see the Create a load-testing instance section). On the CloudWatch console, you can see your service scale up and down. Because the Auto Scaling group is set up with two Availability Zones, you should be able to see five EC2 instances in each zone. Also, because the ECS service scheduler is Availability Zone–aware, the tasks would be distributed across those two zones too.

You can further test the high availability by terminating your EC2 instances manually from the EC2 console. The Auto Scaling group and ECS service scheduler should bring up additional EC2 instances, followed by tasks.

Additional Considerations

  • Reserve capacity. As discussed before, keeping some additional ECS cluster capacity in reserve helps the ECS service to scale out much faster, without waiting for the cluster’s newly provisioned instances to warm up. This can easily be achieved by either changing the values on which CloudWatch alarms are triggered, or by changing the parameters of the scaling policy itself.
  • Instance termination protection. While scaling in, in some cases, a decrease in available ECS cluster capacity might force some tasks to be terminated or relocated from one host to another. This can be mitigated by either tweaking ECS cluster scale in policies to be less responsive to demand or by gracefully allowing tasks to finish on an EC2 host, before it is terminated. This can easily be achieved by tapping into the Auto Scaling Lifecycle events or instance termination protection, which is a topic for a separate post.

Although we have used the AWS console to create this walkthrough, you can always use the AWS SDK or the CLI to achieve the same result.


When you run a mission-critical microservices architecture, keeping your TCO down is critical, along with having the ability to deploy the workload on multiple zones and to adjust ECS service and cluster capacity to respond to load variations. Using the procedure outlined in this post, which leverages two-dimensional scaling, you can achieve the same results.