How do I resolve cluster creation errors in Amazon EKS?
Last updated: 2020-02-12
I get service errors when I provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster using AWS CloudFormation or eksctl.
Consider the following troubleshooting options:
- If you receive an error message stating that your targeted Availability Zone doesn't have sufficient capacity to support the cluster, then complete the steps in the Recreate the cluster in a different Availability Zone section.
- If you receive an error message stating that resource creation failed, then complete the steps in the Confirm that you have the correct IAM permissions to create a cluster section, or in the Monitor your Amazon VPC resources section.
- If you receive an error message stating that the creation timed out waiting for worker nodes, then complete the steps in the Confirm that your worker nodes can reach the control plane API endpoint section.
Recreate the cluster in a different Availability Zone
If you launch control plane instances in an Availability Zone with limited capacity, you could receive an error similar to the following:
Cannot create cluster 'sample-cluster' because us-east-1d, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1c
To resolve this error, create the cluster again using the recommended Availability Zones from the error message.
If you're provisioning the cluster using AWS CloudFormation, then pass in values for the Subnets parameter for subnets that match the Availability Zones.
If you're using eksctl, then use the --zones flag to pass in the values for the different Availability Zones. For example, if you receive the preceding error, then run the following command:
$ eksctl create cluster 'sample-cluster' --zones us-east-1a,us-east-1b,us-east-1c
Note: Replace sample-cluster with your cluster name. Replace us-east-1a, us-east-1b, and us-east-1c with your Availability Zones.
Confirm that you have the correct IAM permissions to create a cluster
Verify that you have the correct AWS Identity and Access Management (IAM) permissions when you create a cluster, including the correct policies for the Amazon EKS service IAM role.
You can use eksctl to create the prerequisite resources for your cluster, such as the IAM roles and security groups. The minimum permissions required depend on the eksctl configuration that you're launching. For more information, review troubleshooting solutions from the eksctl GitHub community.
If your cluster has issues with IAM permissions, you could receive an error similar to the following in eksctl:
API: iam:CreateRole User: arn:aws:iam::your-account-id:user/your-user-name is not authorized to perform: iam:CreateRole on resource: arn:aws:iam::your-account-id:role/eksctl-newtest22-cluster-ServiceRole-10NXBYLSN4ULP
Tip: For an easier to read error message, review the error in the AWS CloudFormation console.
Monitor your Amazon VPC resources
By default, eksctl creates a new Amazon Virtual Private Cloud (Amazon VPC) when you create a cluster, unless you specify your own custom Amazon VPC and subnets in the configuration file.
If your cluster has issues with your Amazon VPC limits, then you could receive the following error message:
The maximum number of VPCs has been reached. (Service: AmazonEC2; Status Code: 400; Error Code: VpcLimitExceeded; Request ID: a12b34cd-567e-890-123f-ghi4j56k7lmn)
To resolve this error, monitor your resources, such as the number of Amazon VPCs in your AWS Region or the internet gateways per Region where you create the cluster. For more information, see Amazon VPC Quotas.
If you have an issue regarding resource constraints on the number of Amazon VPC resources in your Region, consider one of the following options:
(Option 1) Use an existing Amazon VPC to overcome resource constraints
To create a configuration file that specifies the VPC and the subnets where you want your cluster's worker nodes to be provisioned, run the following command:
$ eksctl create cluster sample-cluster -f cluster.yaml
(Option 2) Request a service quota increase to overcome resource constraints
Request a service quota increase on the resources that act as a bottleneck in the AWS CloudFormation stack events of the cluster provisioned by eksctl.
Confirm that your worker nodes can reach the control plane API endpoint
When eksctl deploys your cluster, it waits for the worker nodes that are launched to join the cluster and reach Ready status. If your worker nodes can't reach the control plane or have an invalid IAM role, then you could receive the following error:
timed out (after 25m0s) waiting for at least 4 nodes to join the cluster and become ready in "eksfbots-ng1"
To resolve this error, get your worker nodes to join the cluster, and confirm that your worker nodes are in Ready status.