Why are my Amazon ECS container instances with Amazon Linux 1 AMIs disconnected?

Last updated: 2019-07-24

My container instances for Amazon Elastic Container Service (Amazon ECS) are disconnected. How can I resolve this issue?

Short Description

Your Amazon ECS container agent might connect and reconnect several times an hour. These change events are normal and aren't a cause for concern.

However, if your container agent remains in a disconnected state, then the container instance can't operate as part of your ECS cluster. Your agent is disconnected when agentConnected returns false. The issue can be caused by the following:

  • Networking issues prevent communication between the instance and Amazon ECS
  • The container agent doesn't have the required AWS Identity and Access Management (IAM) permissions to communicate with Amazon ECS endpoints
  • There are problems with the host or Docker service inside the container instance

To identify the cause of the disconnection, complete the following steps.

Note: The following resolution applies to Amazon ECS-optimized Amazon Linux 1 AMIs. For a resolution that applies to Amazon ECS-optimized Amazon Linux 2 AMIs, see Why are my Amazon ECS container instances with Amazon Linux 2 AMIs disconnected?

Resolution

Verify that the Docker service is running on the container instance

1.    To verify that the Docker service is running on the affected container instance, run the following command:

sudo service docker status

The command output should be similar to the following:

docker (pid 23013) is running...

If the Docker service isn't running, or if you need to restart the service, run the following command:

sudo service docker restart

The command output should include the following lines:

Stopping docker: [  OK  ]
Starting docker: [  OK  ]

Note: To verify that the Docker service is running after the restart command, run the sudo service docker status command.

2.    To start the ECS agent, run the following command:

sudo start ecs

Verify that the container agent is running on the container instance

To verify that the container agent is running on the affected container instance, run the following command:

sudo status ecs

If the container agent isn't running on your container instance, run the following command to start the agent:

sudo start ecs

The command output should be similar to the following:

ecs start/running, process 23403

Review log files for the container agent and Docker

If your container instances are still disconnected, review the log files on the container host for the container agent and Docker.

To output the log files for the container agent and Docker, run the following commands:

 sudo cat /var/log/ecs/ecs-agent.log.YYYY-MM-DD-**
 sudo cat /var/log/docker

Note: To collect log information from the container instance, run the Amazon ECS logs collector.

Verify that the IAM instance profile has the necessary permissions

If the container agent is still disconnected, verify that the IAM instance profile associated with the container instance has the necessary IAM permissions.

1.    Connect to the instance using SSH.

2.    To view the instance metadata on the instance profile associated with the instance, run the following command:

curl http://169.254.169.254/latest/meta-data/iam/info

The command output should be similar to the following:

{
  "Code" : "Success",
  "LastUpdated" : "2019-06-29T15:47:03Z",
  "InstanceProfileArn" : "arn:aws:iam::1122334455:instance-profile/ecsInstanceRole",
  "InstanceProfileId" : "AIPAJ5WF3LZVY7PLUHV72"
}

3.    Verify that the IAM role contains the correct permissions for your container instances.

4.    To verify specific credential errors with the container agent, run the following command to check the container agent log for a list of ECS logs:

cat /var/log/ecs/ecs-agent.log.YYYY-MM-DD-**

Note: The container agent log is rotated every hour, and the suffix automatically changes to reflect the current date and time. Update the command to include the date range and log ID for when the issue occurred.

If the container agent doesn't have the necessary credentials, you'll see an error similar to the following in the logs:

2019-06-29T16:10:09Z [ERROR] Unable to register as a container instance with ECS: AccessDeniedException: User: arn:aws:sts::1122334455:assumed-role/ecsInstanceRole/i-0052b2e858b1891ef is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-1:1122334455:cluster/exampleCluster
    status code: 400, request id: 0b73e260-5088-4688-a425-6f35f1ef440f
2019-06-29T16:10:09Z [ERROR] Error re-registering: AccessDeniedException: User: arn:aws:sts::1122334455:assumed-role/ecsInstanceRole/i-0052b2e858b1891ef is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-1:1122334455:cluster/exampleCluster
    status code: 400, request id: 0b73e260-5088-4688-a425-6f35f1ef440f