I have several Amazon Elastic Container Service (Amazon ECS) container instances with the ECS container agent, and these container instances are in a disconnected state. Why are they disconnected?

The ECS container agent associates ECS container instances to your cluster and tells Docker when to start, stop, or query the containers you specify to run. If the agent can't access the service, then the container instance can't operate as a member of your ECS cluster. For more information about the container agent, see Amazon ECS container agent.

The container agent might disconnect and reconnect several times an hour as part of its normal operation. Connection events that last for only a few minutes aren't an indication that there are issues with the container agent or your container instance. For more information, see Container Instance State Change Events.

If the container agent remains in a disconnected state for longer than a few minutes, the container instance can't operate as part of your ECS cluster. The issue can be caused by:

  • Networking issues that prevent communication between the instance and ECS.
  • The container agent lacks the required AWS Identity and Access Management (IAM) permissions to communicate with the ECS endpoints.
  • Problems at the host level, or at the Docker daemon level inside the container instance.

The following steps can help identify the cause of the failure. These commands are supported using the Amazon ECS-optimized Amazon Machine Image (AMI) that is provided by AWS.

Note: Changes might be necessary if you are using a different AMI.

Verify that the Docker daemon is running on the container instance

To verify that the Docker daemon is running on the affected container instance, run the following command:

ps aux | grep dockerd

This command output should be similar to the following:

root      2909  7.0  1.2 699128 52284 ?        Sl   13:29   0:04 /usr/bin/dockerd --default-ulimit nofile=1024:4096 --storage-driver devicemapper --storage-opt dm.thinpooldev=/dev/mapper/docker-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.fs=ext4 --storage-opt dm.use_deferred_deletion=true

Verify that the Docker Container daemon is running on the container instance

To verify that the Docker Container daemon is running on the affected container instance, run the following command:

ps aux | grep docker-containerd

This command output should be similar to the following:

root      2916  0.8  0.3 361096 12396 ?        Ssl  13:29   0:02 docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-shim --runtime docker-runc

If these commands don't output the expected results, perform a Docker daemon stop and start by running the following command:

sudo service docker stop && sudo service docker start

The output of the command should consist of two lines indicating an [OK] relating to the stop and start of the Docker service at the host level.

Verify that the container agent is running on the container instance

To verify that the container agent is running on the affected container instance, run the following command:

docker ps

Note: By default, the container agent uses ecs-agent as the Docker container name, but it can vary if you run the container yourself.

If the ECS container agent isn't running, start it by running the following command:

sudo start ecs

If the ECS cluster still has a disconnected status after starting the container agent, verify that the IAM instance profile associated with the container instance has the necessary IAM permissions.

Verify that the IAM instance profile has the necessary permissions

Using the IAM console or the AWS Command Line Interface (AWS CLI), verify if the instance profile associated with your container instances meets the necessary requirements.

Analyze the container agent log file at the instance level, and verify that there are no credential errors. The container agent log is rotated every hour, and the suffix automatically changes to reflect the current date and time. Run the following command to list the ECS logs:

cat /var/log/ecs/ecs-agent.log.2017-10-24-13

Note: Update the command to include the date range when the issue occurred.

If the container agent doesn't have the necessary credentials, you'll see an error similar to the following in the logs:

2017-10-24T13:48:59Z [INFO] Registering Instance with ECS
2017-10-24T13:48:59Z [ERROR] Could not register: NoCredentialProviders: no valid providers in chain. Deprecated.
	For verbose messaging see aws.Config.CredentialsChainVerboseErrors
2017-10-24T13:48:59Z [CRITICAL] Could not create cluster: NoCredentialProviders: no valid providers in chain. Deprecated.
	For verbose messaging see aws.Config.CredentialsChainVerboseErrors
2017-10-24T13:48:59Z [ERROR] Error registering: NoCredentialProviders: no valid providers in chain. Deprecated.
	For verbose messaging see aws.Config.CredentialsChainVerboseErrors

Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2016-08-04

Updated: 2018-07-19