Why are my Amazon ECS container instances with Amazon Linux 2 AMIs disconnected?

Last updated: 2019-07-24

My container instances for Amazon Elastic Container Service (Amazon ECS) are disconnected. How can I resolve this issue?

Short Description

Your Amazon ECS container agent might connect and reconnect several times an hour. These change events are normal and aren't a cause for concern.

However, if your container agent remains in a disconnected state, then the container instance can't operate as part of your ECS cluster. Your agent is disconnected when agentConnected returns false. The issue can be caused by the following:

  • Networking issues prevent communication between the instance and Amazon ECS
  • The container agent doesn't have the required AWS Identity and Access Management (IAM) permissions to communicate with Amazon ECS endpoints
  • There are problems with the host or Docker service inside the container instance

To identify the cause of the disconnection, complete the following steps.

Note: The following resolution applies to Amazon ECS-optimized Amazon Linux 2 AMIs. For a resolution that applies to Amazon ECS-optimized Amazon Linux 1 AMIs, see Why are my Amazon ECS container instances with Amazon Linux 1 AMIs disconnected?

Resolution

Verify that the Docker service is running on the container instance

1.    To verify that the Docker service is running on the affected container instance, run the following command:

sudo systemctl status docker

The command output should be similar to the following:

  docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri 2019-06-28 03:23:52 UTC; 1 day 12h ago
     Docs: https://docs.docker.com
  Process: 5519 ExecStartPre=/usr/libexec/docker/docker-setup-runtimes.sh (code=exited, status=0/SUCCESS)
  Process: 5509 ExecStartPre=/bin/mkdir -p /run/docker (code=exited, status=0/SUCCESS)
 Main PID: 5531 (dockerd)
    Tasks: 60
   Memory: 55.4M
   CGroup: /system.slice/docker.service
           ├─5531 /usr/bin/dockerd --default-ulimit nofile=1024:4096
           ├─5570 docker-containerd --config /var/run/docker/containerd/containerd.toml
           ├─5782 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/...
           ├─6006 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/...
           └─6284 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/...

If the Docker service is inactive, then run the following command to restart the Docker service:

sudo systemctl restart docker

Note: The command shouldn't return any output, but you can run the sudo systemctl status docker command to verify that the Docker service started.

2.    To start the container agent, run the following command:

sudo start ecs

Verify that the container agent is running on the container instance

To verify that the container agent is running on the affected container instance, run the following command:

sudo systemctl status ecs

The command output should be similar to the following:

ecs.service - Amazon Elastic Container Service - container agent
   Loaded: loaded (/usr/lib/systemd/system/ecs.service; enabled; vendor preset: disabled)
   Active: active (running) since Sat 2019-06-29 15:45:57 UTC; 4min 5s ago
     Docs: https://aws.amazon.com/documentation/ecs/
  Process: 18896 ExecStopPost=/usr/libexec/amazon-ecs-init post-stop (code=exited, status=0/SUCCESS)
  Process: 18818 ExecStop=/usr/libexec/amazon-ecs-init stop (code=exited, status=0/SUCCESS)
  Process: 19422 ExecStartPre=/usr/libexec/amazon-ecs-init pre-start (code=exited, status=0/SUCCESS)
 Main PID: 19455 (amazon-ecs-init)
    Tasks: 7
   Memory: 2.7M
   CGroup: /system.slice/ecs.service
           └─19455 /usr/libexec/amazon-ecs-init start

If the command output doesn't show the service as active, run the following command to restart the service:

sudo systemctl restart ecs

Note: The command shouldn't return any output, but you can run the sudo systemctl status ecs command to verify that the container agent started.

Review log files for the container agent and Docker

If your container instances are still disconnected, review the log files on the container host for the container agent and Docker.

To output the log files for the container agent and Docker, run the following commands:

 sudo journalctl -u ecs
 sudo journalctl -u docker

Note: To collect log information from the container instance, run the Amazon ECS logs collector.

Verify that the IAM instance profile has the necessary permissions

If the container agent is still disconnected, verify that the IAM instance profile associated with the container instance has the necessary IAM permissions.

1.    Connect to the instance using SSH.

2.    To view the instance metadata on the instance profile associated with the instance, run the following command:

curl http://169.254.169.254/latest/meta-data/iam/info

The command output should be similar to the following:

{
  "Code" : "Success",
  "LastUpdated" : "2019-06-29T15:47:03Z",
  "InstanceProfileArn" : "arn:aws:iam::1122334455:instance-profile/ecsInstanceRole",
  "InstanceProfileId" : "AIPAJ5WF3LZVY7PLUHV72"
}

3.    Verify that the IAM role contains the correct permissions for your container instances.

4.    To verify specific credential errors with the container agent, run the following command to check the container agent log for a list of ECS logs:

cat /var/log/ecs/ecs-agent.log.YYYY-MM-DD-**

Note: The container agent log is rotated every hour, and the suffix automatically changes to reflect the current date and time. Update the command to include the date range and log ID for when the issue occurred.

If the container agent doesn't have the necessary credentials, you'll see an error similar to the following in the logs:

2019-06-29T16:10:09Z [ERROR] Unable to register as a container instance with ECS: AccessDeniedException: User: arn:aws:sts::1122334455:assumed-role/ecsInstanceRole/i-0052b2e858b1891ef is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-1:1122334455:cluster/exampleCluster
    status code: 400, request id: 0b73e260-5088-4688-a425-6f35f1ef440f
2019-06-29T16:10:09Z [ERROR] Error re-registering: AccessDeniedException: User: arn:aws:sts::1122334455:assumed-role/ecsInstanceRole/i-0052b2e858b1891ef is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-1:1122334455:cluster/exampleCluster
    status code: 400, request id: 0b73e260-5088-4688-a425-6f35f1ef440f