I have several ECS container instances with the Amazon EC2 Container Service (Amazon ECS) container agent that are in a disconnected state. Why are they disconnected?

The Amazon ECS container agent associates container instances to your cluster and tells Docker when to start, stop, and query the containers you have specified to run. If the agent is unable to access the service, the container instance is not able to operate as a member of your ECS cluster. For more information about the Amazon ECS container agent, see Amazon ECS Container Agent.

The Amazon ECS container agent may disconnect and reconnect several times an hour as part of its normal operation. Connection events that only last for a few minutes are not an indication that there are issues with the container agent or your container instance. For more information, see Container Instance State Change Events.

If the Amazon ECS container agent remains in a disconnected state for longer than a few minutes, the container instance will not be able to operate as part of your ECS cluster. The issue can be caused by:

  • Networking issues that prevent communication between the instance and ECS.
  • The Amazon ECS container agent lacks the required AWS Identity and Access Management (IAM) permissions to communicate with the ECS endpoints.
  • Problems at the host level, or at the Docker daemon level inside the ECS container instance.

The following steps can help identify the cause of the failure. These commands are supported using the Amazon ECS optimized Amazon Machine Image (AMI) that is provided by AWS.

Note: Changes might be necessary if you are using a different AMI.

Verify that the Docker daemon is running on the ECS container instance

To verify that the Docker daemon is running on the affected ECS container instance, run the following command:

ps aux | grep dockerd

This command output should be similar to the following:

root      2909  7.0  1.2 699128 52284 ?        Sl   13:29   0:04 /usr/bin/dockerd --default-ulimit nofile=1024:4096 --storage-driver devicemapper --storage-opt dm.thinpooldev=/dev/mapper/docker-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.fs=ext4 --storage-opt dm.use_deferred_deletion=true

Verify that the Docker Container daemon is running on the ECS container instance

To verify that the Docker Container daemon is running on the affected ECS container Instance, run the following command:

ps aux | grep docker-containerd

This command output should be similar to the following:

root      2916  0.8  0.3 361096 12396 ?        Ssl  13:29   0:02 docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-shim --runtime docker-runc

If these commands do not output the expected results, perform a Docker daemon stop and start by running the following command:

sudo service docker stop && sudo service docker start

The output of the command should consist of two lines indicating an [ OK ] relating to the stop and start of the Docker service at the host level.

Verify that the Amazon ECS container agent is running on the ECS container instance

To verify that the Amazon ECS container agent is running on the affected container instance, run the following command:

docker ps

Note: By default, the Amazon ECS container agent uses ecs-agent as the Docker container name, but it can vary if you run the container yourself.

If the Amazon ECS container agent isn't running, start it by running the following command:

sudo start ecs

If the ECS cluster still has a disconnected status after starting the Amazon ECS container agent, verify that the IAM instance profile associated with the ECS container instance has the necessary IAM permissions.

Verify that the IAM instance profile has the necessary permissions

Using the IAM console or the AWS Command Line Interface (AWS CLI), verify if the instance profile associated with your ECS container instances meets the necessary requirements.

Analyze the Amazon ECS container agent log file at the instance level, and verify that there are no credential errors. The Amazon ECS container agent log is rotated every hour, and the suffix will automatically change to reflect the current date and time. Run the following command to list the ECS logs:

cat /var/log/ecs/ecs-agent.log.2017-10-24-13

Note: Update the command to include the date range when the issue occurred.

If the Amazon ECS container agent doesn't have the necessary credentials, you'd see an error similar to the following in the logs:

2017-10-24T13:48:59Z [INFO] Registering Instance with ECS
2017-10-24T13:48:59Z [ERROR] Could not register: NoCredentialProviders: no valid providers in chain. Deprecated.
	For verbose messaging see aws.Config.CredentialsChainVerboseErrors
2017-10-24T13:48:59Z [CRITICAL] Could not create cluster: NoCredentialProviders: no valid providers in chain. Deprecated.
	For verbose messaging see aws.Config.CredentialsChainVerboseErrors
2017-10-24T13:48:59Z [ERROR] Error registering: NoCredentialProviders: no valid providers in chain. Deprecated.
	For verbose messaging see aws.Config.CredentialsChainVerboseErrors

Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2016-08-04

Updated: 2017-12-18