How do I troubleshoot service auto scaling issues in Amazon ECS?

Last updated: 2022-03-30

The service auto scaling configured for my Amazon Elastic Container Service (Amazon ECS) service isn't scaling in or scaling out the desired task count as required.

Short description

You can update the desired count of tasks for your Amazon ECS service automatically by integrating your ECS service with the Application Auto Scaling service and Amazon CloudWatch alarms.

Service auto scaling might not be able to add or remove tasks as expected due to one or more of the following reasons:

  • The scaling policies aren't configured correctly.
  • The CloudWatch alarm that triggers the scaling policies are deleted or edited.
  • The cron expression format is incorrectly specified in the scheduled action.
  • You updated the desired task count manually or through AWS CloudFormation or AWS Cloud Development Kit (AWS CDK) to a value that's less than the minimum or more than the maximum value set in service auto scaling.
  • Your ECS Cluster doesn't have enough resources or capacity to run new tasks.

Resolution

Troubleshooting CloudWatch alarms and scaling policies

Scalable target

  • Be sure that the ECS service is registered as a scalable target with Application Auto Scaling. If the service isn't registered, register the service using the following command. Then, configure the scaling policies and CloudWatch alarms accordingly. For more information, see How can I configure Amazon ECS Service Auto Scaling on Fargate?
aws application-autoscaling register-scalable-target --service-namespace ecs --scalable-dimension ecs:service:DesiredCount \
--resource-id service/your-cluster/your-service-name --min-capacity 1 --max-capacity 10 --region example-region
  • Use the following commands to retrieve information about the service auto scaling of your ECS service:
aws application-autoscaling describe-scalable-targets --service-namespace ecs --region example-region 
aws application-autoscaling describe-scaling-policies --service-namespace ecs --region example-region
aws application-autoscaling describe-scaling-activities --service-namespace ecs \ 
--scalable-dimension ecs:service:DesiredCount --resource-id service/your-cluster/your-service-name --region example-region
  • When you create or update the CloudWatch alarms for your ECS service auto scaling, be sure that the metrics, dimensions, statistics, period, condition, and threshold values are specified correctly. Otherwise, the alarm isn't triggered to update the associated scaling policy.

Note: If you receive errors when running AWS CLI commands, make sure that you’re using the most recent version of the AWS CLI.

Step scaling

  • Check whether the CloudWatch alarms associated with the scaling policies are being triggered. Check whether there were any errors by viewing the CloudWatch alarms history.
  • For step scaling policies, check whether the breach threshold, step adjustments, and scaling adjustment type are set in the CloudWatch alarm. If there is no matching step adjustment for the breach delta, that is threshold subtracted from the metric value, then you can see the following error message in the alarm history: Failed to execute AutoScaling action: No step adjustment found for metric value [xx, xx] and breach delta xx
    Therefore, be sure that all step adjustments, that is from -infinity to 0 for scale-in and 0 to +infinity for scale-out, are covered in your policies.
    Note: Typically, for a scale-out policy, in the step adjustments, only the upper bound can be null (positive infinity). For a scale-in policy, only the lower bound can be null (negative infinity). For more information, see Step adjustments.

Target tracking scaling

  • Because target tracking scaling policies are AWS managed, Application Auto Scaling creates CloudWatch alarms for scaling in and scaling out based on the target value or threshold that's specified during the creation of these policies. Therefore, don't edit or delete these alarms. Editing or deleting these alarms affects the scaling behavior. If you modified or deleted these alarms, be sure to recreate the target tracking policy again.
  • You can have multiple target tracking scaling policies for an ECS service as long as each policy uses a different metric. The intention of Application Auto Scaling is to prioritize the availability. Therefore, the behavior of these policies differs depending on whether the target tracking policies are ready for scaling out or scaling in. Application Auto Scaling scales out the tasks if any of the target tracking policies are ready for scale out, but scales in only if all of the target tracking policies (with the scale-in portion enabled) are ready to scale in.
  • When multiple scaling policies, including target tracking and step scaling policies, are configured for an ECS service, be sure that they don't conflict. These conflicts might cause undesirable behavior, such as having consecutive scale-in and scale-out, resulting in unnecessary oscillation of task count.

For more information, see Target tracking scaling policies for Application Auto Scaling.

Troubleshooting incorrect cron expression

Be sure that the cron expression specified in the schedule is correct in the configuration of scheduled actions for Application Auto Scaling. The cron format that's supported by Application Auto Scaling consists of six fields separated by white spaces: [Minutes] [Hours] [Day_of_Month] [Month] [Day_of_Week] [Year].

For more information, see Example scheduled actions for Application Auto Scaling.

Troubleshooting the desired task count update

Keep the following in mind when you either manually update or use CloudFormation or AWS CDK to update the desired task count for your ECS service:

  • If you updated the desired task count for your ECS service to a value that's below the minimum capacity value, and an alarm triggers a scale-out activity, service auto scaling scales the desired count up to the minimum capacity value. Then, service auto scaling continues to scale out as required, based on the scaling policy associated with the alarm. However, a scale-in activity doesn't adjust the desired count, because the desired count is already below the minimum capacity value.
  • If you updated the desired task count for your ECS service to a value that's above the maximum capacity value, and an alarm triggers a scale in activity, service auto scaling scales the desired count out to the maximum capacity value. Then, service auto scaling continues to scale in as required, based on the scaling policy associated with the alarm. However, a scale-out activity doesn't adjust the desired count, because the desired count is already above the maximum capacity value.
  • If you created your ECS service with CloudFormation or CDK without specifying the DesiredCount field, then the desired count is set to a default value of 1. However, when the same service is updated through CloudFormation or CDK without specifying the DesiredCount field, then the existing desired count in the current deployment is used for the new deployment. Therefore, when the desired count value is specified in the CloudFormation stack or AWS CDK, be sure that the value is between the minimum and maximum values during the service update.

Troubleshooting cluster capacity issues

When your ECS cluster doesn't have enough resources, such as Amazon Elastic Compute Cloud (Amazon EC2) container instances, to run tasks, then the scaling activity initiated by the scaling policies remains unfulfilled. In this case, an error message is logged in service events. To avoid these Amazon EC2 capacity issues and launch the tasks successfully, leverage Amazon ECS capacity providers to provision EC2 instances automatically, as required.

Note: During scaling activities, service auto scaling uses the actual running task count in a service as the starting point, as opposed to the desired count. This prevents excessive scaling that might not be satisfied, for example, if there aren't enough container instance resources to place the additional tasks. If the container instance capacity is available later, the pending scaling activity might succeed. Then, further scaling activities continue after the cooldown period.


Did this article help?


Do you need billing or technical support?