How do I troubleshoot Amazon ECS tasks that take a long time to stop when the container instance is set to DRAINING?

Last updated: 2019-09-04

My Amazon Elastic Container Service (Amazon ECS) task is taking a long time to move to the STOPPED state. Or, my Amazon ECS task is stuck in the RUNNING state when the container instance is set to DRAINING. How can I resolve this issue?

Short Description

When you set an ECS instance to DRAINING, Amazon ECS does the following:

  • Prevents new tasks from being scheduled for placement on the container instance
  • Stops tasks on the container instance that are in the RUNNING state

Your tasks can be stuck in the RUNNING state or take a longer time to move to the STOPPED state due to issues with configuration parameters or tasks. To troubleshoot these issues, consider the following options:

Resolution

Confirm that your DeploymentConfiguration parameters are set correctly

  1. Open the Amazon ECS console.
  2. In the navigation pane, choose Clusters, and then choose the cluster where your container instance is draining.
  3. Choose the ECS Instances tab, and then choose DRAINING in the Status section.
  4. Choose your container instance, and then find out the service for the tasks that are draining or taking a long time to drain.
  5. Choose the Services tab, select the service, and then choose Deployments.
  6. Check the values for minimumHealthyPercent and maximumPercent.
    Note: Service tasks on the container instance that are in the RUNNING state are stopped and replaced according to the service's deployment configuration parameters: minimumHealthyPercent and maximumPercent.

Confirm that the deregistration delay value is set correctly

Important: The following steps apply only to services using the Application Load Balancer or Network Load Balancer. If your service is using the Classic Load Balancer, check the connection draining values.

  1. Open the Amazon ECS console.
  2. In the navigation pane, choose Clusters, and then choose the cluster where your container instance is draining.
  3. Choose the Services tab, and then select the service with the stack stuck in RUNNING.
  4. Choose Target Group Name.
  5. On the Details tab, scroll down, and then select the Deregistration delay check box.

Confirm that the ECS_CONTAINER_STOP_TIMEOUT value is set correctly

  1. Connect to your container instance using SSH.
  2. Run the docker inspect ecs-agent --format '{{json .Config.Env}}' command.
  3. Check if there is a value for ECS_CONTAINER_STOP_TIMEOUT.
    Note: ECS_CONTAINER_STOP_TIMEOUT is an ECS container agent parameter that defines the amount of time that Amazon ECS waits before killing a container that doesn't normally exit by itself. The time duration starts counting when a task is stopped. If you don't see the ECS_CONTAINER_STOP_TIMEOUT parameter in the output after running the command in step 2, then Amazon ECS is using the default value of 30s.

Look for other task-related issues

  1. Connect to your container instance using SSH.
  2. Verify that the Docker daemon and Amazon ECS container agent are running for either your Amazon Linux 1 AMIs or Amazon Linux 2 AMIs.
  3. Check the application logs based on the log driver set by logConfiguration.
    Note: For example, if your tasks are using the awslogs log driver, check your Amazon CloudWatch Logs for issues.

Did this article help you?

Anything we could improve?


Need more help?