How do I troubleshoot Amazon ECS tasks that take a long time to stop when the container instance is set to DRAINING?

3 minute read

My Amazon Elastic Container Service (Amazon ECS) task is taking a long time to move to the STOPPED state. Or, my Amazon ECS task is stuck in the RUNNING state when the container instance is set to DRAINING. How can I resolve this issue?

Short description

When you set an ECS instance to DRAINING, Amazon ECS does the following:

Prevents new tasks from being scheduled for placement on the container instance
Stops tasks on the container instance that are in the RUNNING state

Your tasks can be stuck in the RUNNING state or take a longer time to move to the STOPPED state due to issues with configuration parameters or tasks. To troubleshoot these issues, consider the following options:

Confirm that your DeploymentConfiguration parameters are set correctly
Confirm that the deregistration delay value is set correctly
Confirm that the ECS_CONTAINER_STOP_TIMEOUT value is set correctly
Look for other task-related issues

Resolution

Confirm that your DeploymentConfiguration parameters are set correctly

Open the Amazon ECS console.
In the navigation pane, choose Clusters, and then choose the cluster where your container instance is draining.
Choose the ECS Instances tab, and then choose DRAINING in the Status section.
Choose your container instance, and then find out the service for the tasks that are draining or taking a long time to drain.
Choose the Services tab, select the service, and then choose Deployments.
Check the values for minimumHealthyPercent and maximumPercent.
Note: Service tasks on the container instance that are in the RUNNING state are stopped and replaced according to the service's deployment configuration parameters: minimumHealthyPercent and maximumPercent.

Confirm that the deregistration delay value is set correctly

Important: The following steps apply only to services using the Application Load Balancer or Network Load Balancer. If your service is using the Classic Load Balancer, check the connection draining values.

Open the Amazon ECS console.
In the navigation pane, choose Clusters, and then choose the cluster where your container instance is draining.
Choose the Services tab, and then select the service with the stack stuck in RUNNING.
Choose Target Group Name.
On the Details tab, scroll down, and then select the Deregistration delay check box.

Confirm that the ECS_CONTAINER_STOP_TIMEOUT value is set correctly

Connect to your container instance using SSH.
Run the docker inspect ecs-agent --format '{{json .Config.Env}}' command.
Check if there is a value for ECS_CONTAINER_STOP_TIMEOUT.
Note: ECS_CONTAINER_STOP_TIMEOUT is an ECS container agent parameter that defines the amount of time that Amazon ECS waits before ending a container. The time duration starts counting when a task is stopped. If you don't see the ECS_CONTAINER_STOP_TIMEOUT parameter in the output after running the command in step 2, then Amazon ECS is using the default value of 30s.

Look for other task-related issues

Connect to your container instance using SSH.
Verify that the Docker daemon and Amazon ECS container agent are running for either your Amazon Linux 1 AMIs or Amazon Linux 2 AMIs.
Check the application logs based on the log driver set by logConfiguration.
Note: For example, if your tasks are using the awslogs log driver, check your Amazon CloudWatch Logs for issues.

Topics

Containers

Relevant content

CodePipeline + ECS deploy: deregister, drain, stop?
AJ
asked 9 months ago
How to set the subnet priority of ecs Fargate task
Accepted Answer
yudeye-rePost
asked 2 years ago
ECS: Tasks deployed on a terminating EC2 instance
Accepted Answer
Valentin
asked 2 months ago
ECS Fargate Service Task URL is taking to long too respond
Accepted Answer
Pratyusha
asked 8 months ago
ECS tasks keeps draining for too long
rePost-User-8190728
asked a year ago
How do I troubleshoot the container health check failures for Amazon ECS tasks?
AWS OFFICIALUpdated a year ago
How do I troubleshoot Amazon ECS tasks stopping or failing to start while my container exits?
AWS OFFICIALUpdated a year ago
How do I troubleshoot an Amazon ECS task that failed to start in an ECS cluster?
AWS OFFICIALUpdated 4 months ago
How do I change my container instance type in Amazon ECS?
AWS OFFICIALUpdated a year ago
Why do I get an error "The snapshot is currently in use by AMI" when I try to delete an EBS snapshot, even though there is no AMI on the account?
EXPERT
dnyanesh-db
published 4 months ago