Why is my Amazon ECS or Amazon EC2 instance unable to join the cluster?
Last updated: 2022-05-05
I am unable to register my Amazon Elastic Compute Cloud (Amazon EC2) instance with an Amazon Elastic Container Service (Amazon ECS) Cluster.
Your Amazon EC2 instance can't register with or join an ECS cluster because of one or more of the following reasons:
- The ECS endpoint can't access the DNS hostname of the instance publicly.
- Your public subnet configurations are incorrect.
- Your private subnet configurations are incorrect.
- Your VPC endpoints are incorrectly configured.
- Your security groups don't allow network traffic.
- The EC2 instance doesn't have the required AWS Identity and Access Management (IAM) permissions. Or, the ecs:RegisterContainerInstance API call is denied.
- The instance user data for your ECS container isn't configured properly.
- The ECS agent is stopped or not running on the instance.
- The launch configuration of the Auto Scaling group isn't correct (if your instance is part of an Auto Scaling group).
- The Amazon Machine Image (AMI) used for your instance doesn't meet the prerequisites.
Important: Use the AWSSupport-TroubleshootECSContainerInstance AWS Systems Manager runbook to troubleshoot common issues listed in the preceding section. If the runbook's output doesn't provide recommendations, then use the manual troubleshooting approaches explained in subsequent sections.
Use the Systems Manager Automation runbook
With the AWSSupport-TroubleshootECSContainerInstance runbook, you can troubleshoot the EC2 instance that fails to register with the ECS cluster. This automation reviews the following:
- Does the user data for the instance contain the correct cluster information?
- Does the instance profile contain the required permissions?
- Are there any network configuration issues?
Important: Use the AWSSupport-TroubleshootECSContainerInstance runbook in the same AWS Region where your ECS Cluster and EC2 instance are located.
- Open the AWS Systems Manager console.
- In the navigation pane, under Change Management, choose Automation.
- Choose Execute automation.
- Choose the Owned by Amazon tab.
- Under Automation document, search for TroubleshootECSContainerInstance.
- Select the AWSSupport-TroubleshootECSContainerInstance card.
Note: Be sure that you select the radio button and not the hyperlinked automation name.
- Choose Next.
- For Execution automation document, be sure that Simple execution is selected.
- In the Input parameters section, for AutomationAssumeRole, enter the Amazon Resource Name (ARN) of the role that allows Systems Manager Automation to perform actions.
Note: If you don't specify an IAM role, then Systems Manager Automation uses the permissions of the IAM user or role that runs the runbook. For more information about creating the assume role for Systems Manager Automation, see Task 1: Create a service role for Automation.
Important: Be sure that either the AutomationAssumeRole or the IAM user/role have permissions for the following actions: ec2:DescribeIamInstanceProfileAssociations, ec2:DescribeInstanceAttribute, ec2:DescribeInstances, ec2:DescribeNetworkAcls, ec2:DescribeRouteTables, ec2:DescribeSecurityGroups, ec2:DescribeSubnets, ec2:DescribeVpcs, ec2:DescribeVpcEndpoints, iam:GetInstanceProfile, iam:GetRole, iam:SimulateCustomPolicy, and iam:SimulatePrincipalPolicy.
- For ClusterName, enter the cluster name where the EC2 instance failed to register.
- For InstanceId, enter the EC2 Instance ID that failed to register.
- Choose Execute.
The runbook's output provides troubleshooting steps and recommendations for resolving the issue that caused your EC2 instance to not register in the cluster.
Verify the status of the Amazon ECS agent on the Amazon Linux 2 instance
Check whether the ECS agent on the instance is running by running the following command:
sudo status ecs
If the container agent isn't running on your container instance, then run the following command to start the agent:
sudo start ecs
ecs start/running, process 23403
Check launch configurations
If your instance is launched as part of an Auto Scaling group, then be sure that the launch configuration of the Auto Scaling group is correct. For more information, see Step 5 in Refreshing an Amazon ECS container instance cluster with a new AMI.
Check the AMI of your instance
If the AMI used for the EC2 instance is a copied AMI or custom AMI, then be sure that the instance has the following components:
- A modern Linux distribution running at least version 3.10 of the Linux kernel.
- Latest version of the Amazon ECS container agent.
- A Docker daemon running at least version 1.9.0, and any Docker runtime dependencies. To view the current Docker version, run the command sudo docker version. For information about installing the latest Docker version on your particular Linux distribution, see Docker documentation for Install Docker engine.
The Amazon ECS-optimized AMIs are preconfigured with these requirements. Therefore, it's a best practice to use them for your container instances unless your application requires a specific operating system or a Docker version that's not yet available in that AMI.
Verify the logs
If the issue still persists, collect the logs using ECS logs collector, and then review the logs to find the cause. You can also check log files on the container host for the container agent and Docker.
To view the log files for the container agent and Docker, run the following commands:
sudo cat /var/log/ecs/ecs-agent.log.YYYY-MM-DD-** sudo cat /var/log/docker
Error: Launching a new EC2 instance. Status Reason: This account is currently blocked and not recognized as a valid account. Please contact firstname.lastname@example.org if you have questions. Launching EC2 instance failed.
Contact email@example.com as stated in the status reason and mention that you must unblock your account.
Error: re-registering: ClientException: Container instance 12345678910xxxxxxxxxxxx is inactive.\n\tstatus code: 400, request id: 012345678a-012345b-012ab-0a1-9f645f4s5c12" module=agent.go
You get this error when the ECS agent is unable to register the EC2 container instance with the ECS cluster because the EC2 instance is inactive now. This error is related to the application running on the instance. To understand the cause of the error, check the application. If the error persists, then check the ECS agent logs.
Error: Few instances are able to join the cluster but with the same configurations, other instance are not able to join the cluster.
This error might be caused due to ThrottlingException that results when the rate limit for a specific API call is exceeded. To resolve this error, increase the account-level rate limit. Be sure to check for APIs, such as RegisterTargets and RegisterContainerInstance.
Error: After changing the instance type, new instances are unable to join the cluster.
This error is caused when the ECS agent is stuck in pending state and can't change the instance type. Unlike other EC2 instances, you can't stop the ECS instance, change the instance type, and then start it again. To change the instance type in ECS, you must terminate the container instance and then launch a new container instance with the desired instance size using the latest Amazon ECS-optimized Amazon Linux 2 AMI for your desired cluster. You can also create a new launch configuration and then update this launch configuration in the Auto Scaling group.
Error: Unable to register as a container instance with ECS: AccessDeniedException: User: arn:aws:sts::1122334455:assumed-role/ecsInstanceRole/i-00aa11bb22cc33def is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-1:1122334455:cluster/exampleCluster . status code: 400, request id: 0a123456-7899-10101-a987-6543210deff
2019-06-29T16:10:09Z [ERROR] Error re-registering: AccessDeniedException: User: arn:aws:sts::1122334455:assumed-role/ecsInstanceRole/i-0052b2e858b1891ef is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-1:1122334455:cluster/exampleCluster status code: 400, request id: 0a123456-7899-10101-a987-123456pqrs
These errors are caused due to missing IAM permissions. To resolve these errors, review the instructions in the section Verify the IAM role and policies associated with the instance.