How do I troubleshoot issues with the API server endpoint of my Amazon EKS cluster?

Last updated: 2022-08-17

I can't run kubectl commands. Also, I changed the endpoint access setting from public to private on my Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Now, my cluster is stuck in the Failed state.

Short description

If you have issues with your Kubernetes API server endpoint, complete the steps in one of the following sections:

  • You can't run kubectl commands on the new or existing cluster
  • You can't run kubectl commands on the cluster after you change the endpoint access from public to private
  • Your cluster is stuck in the Failed state and you can't change the endpoint access setting from public to private

Note: To set up access to the Kubernetes API server endpoint, see How do I set up public and private access to the API server in Amazon EKS?

Resolution

You can't run kubectl commands on the new or existing cluster

1.    Confirm that you're using the correct kubeconfig files to connect with your cluster. For more information, see Organizing cluster access using kubeconfig files (from the Kubernetes website).

2.    Check each cluster for multiple contexts in your kubeconfig files.

Example output:

kubectl config view -o jsonpath='{"Cluster name\tServer\n"}{range .clusters[*]}{.name}{"\t"}{.cluster.server}{"\n"}{end}'
Cluster name    Server
new200.us-east-2.eksctl.io       https://D8DC9092A7985668FF67C3D1C789A9F5.gr7.us-east-2.eks.amazonaws.com

If the existing kubeconfig files don't have the correct cluster details, then use the following command to create one with the correct details:

aws eks update-kubeconfig --name cluster name --region region

Note: Replace cluster name with your cluster's name and region with your AWS Region.

3.    Use the telnet on port 443 to validate the API server endpoint connectivity from your device.

Example output:

echo exit | telnet D8DC9092A7985668FF67C3D1C789A9F5.gr7.us-east-2.eks.amazonaws.com 443
Trying 18.224.160.210...
Connected to D8DC9092A7985668FF67C3D1C789A9F5.gr7.us-east-2.eks.amazonaws.com.
Escape character is '^]'.
Connection closed by foreign host.

If the telnet isn't working, then use the following steps to troubleshoot:

Check the DNS resolver

If the API server isn't resolving, then there's an issue with the DNS resolver.

Run the following command from the same device where the kubectl commands failed:

nslookup APISERVER endpoint

Note: Replace APISERVER endpoint with your APISERVER endpoint.

Check if you restricted public access to the API server endpoint

If you specified CIDR blocks to limit access to the public API server endpoint, then it's a best practice to also activate private endpoint access.

4.    Check the API server endpoint access behavior. See Modifying cluster endpoint access.

You can't run kubectl commands on the cluster after you change the endpoint access from public to private

1.    Confirm that you're using a bastion host or connected networks, such as peered VPCs, AWS Direct Connect, or VPNs, to access the Amazon EKS API endpoint.

Note: In private access mode, you can access the Amazon EKS API endpoint only from within the cluster's VPC.

2.    Check whether security groups or network access control lists are blocking the API calls.

If you access your cluster across a peered VPC, then confirm that the control plane security groups allow access from the peered VPC to the control plan security groups at port 443. Also, verify that the peered VPCs have port 53 open to each other. Port 53 is used for DNS resolution.

Your cluster is stuck in the Failed state and you can't change the endpoint access setting from public to private

Your cluster might be in the Failed state because of a permissions issue with AWS Identity and Access Management (IAM).

1.    Confirm that the IAM role for the user is authorized to perform the AssociateVPCWithHostedZone action.

Note: If the action isn't blocked, then check whether the user's account has AWS Organizations policies that are blocking the API calls and causing the cluster to fail.

2.    Confirm that the IAM user's permission isn't implicitly or explicitly blocked at any level above the account.

Note: IAM user permission is implicitly blocked if it's not included in the Allow policy statement. It's explicitly blocked if it's included in the Deny policy statement. Permission is blocked even if the account administrator attaches the AdministratorAccess IAM policy with */* permissions to the user. Permissions from AWS Organizations policies override the permissions for IAM entities.