How can I change the status of my nodes from NotReady or Unknown status to Ready status?
Last updated: 2020-01-28
My Amazon Elastic Kubernetes Service (Amazon EKS) worker nodes are in NotReady or Unknown status. I want to get my worker nodes back in Ready status again.
You can't schedule pods on a node that's in NotReady or Unknown status. You can schedule pods only on a node that's in Ready status.
The following resolution addresses nodes in NotReady or Unknown status.
If your node is in the MemoryPressure, DiskPressure, or PIDPressure status, then you must manage your resources to allow additional pods to be scheduled on the node. If your node is in NetworkUnavailable status, then you must properly configure the network on the node. For more information, see Node Status.
Check the aws-node and kube-proxy pods to see why the nodes are in NotReady status
A node in NotReady status isn't available for pods to be scheduled on.
1. To check the status of your aws-node and kube-proxy pods, run the following command:
$ kubectl get pods -n kube-system -o wide
2. Check the status of the aws-node and kube-proxy pods by reviewing the output from step 1.
Note: The aws-node and kube-proxy pods are managed by a DaemonSet, so each node in the cluster must have one aws-node and kube-proxy pod running on it. If no aws-node or kube-proxy pods are listed, skip to step 4.
If your node status is normal, then your aws-node and kube-proxy pods should be in Running status. For example:
$ kubectl get pods -n kube-system -o wide NAME READY STATUS RESTARTS AGE IP NODE aws-node-qvqr2 1/1 Running 0 4h31m 192.168.54.115 ip-192-168-54-115.ec2.internal kube-proxy-292b4 1/1 Running 0 4h31m 192.168.54.115 ip-192-168-54-115.ec2.internal
If either pod is in a status other than Running, run the following command:
$ kubectl describe pod yourPodName -n kube-system
3. To get additional information from the aws-node and kube-proxy pod logs, run the following command:
$ kubectl logs yourPodName -n kube-system
The logs and the events from the describe output can show why the pods aren't in Running status. For a node to change to Ready status, both the aws-node and kube-proxy pods must be Running on that node.
Note: The name of the pods can differ from aws-node-qvqr2 and kube-proxy-292b4, as shown in the preceding examples.
4. If the aws-node and kube-proxy pods aren't listed after running the command from step 1, then run the following commands:
$ kubectl describe daemonset aws-node -n kube-system
$ kubectl describe daemonset kube-proxy -n kube-system
Search the output of the preceding commands for a reason why the pods can't be started.
Tip: You can search the Amazon EKS control plane logs for information on why the pods can't be scheduled.
In some scenarios, the node can be in Unknown status. This means that the kubelet on the node is unable to communicate with the control plane with the correct status of the node.
To troubleshoot nodes in Unknown status, complete the steps in the following sections:
- Check the network configuration between nodes and the control plane
- Check the status of the kubelet
- Check that the Amazon EC2 API endpoint is reachable
Check the network configuration between nodes and the control plane
1. Confirm that there are no network ACL rules on your subnets blocking traffic between the Amazon EKS control plane and your worker nodes.
2. Confirm that the security groups for your control plane and nodes comply with minimum inbound and outbound requirements.
3. (Optional) If your nodes are configured to use a proxy, confirm that the proxy is allowing traffic to the API server endpoints.
4. To verify that the node has access to the API server, run the following netcat command:
$ nc -vz 9FCF4EA77D81408ED82517B9B7E60D52.yl4.eu-north-1.eks.amazonaws.com 443 Connection to 9FCF4EA77D81408ED82517B9B7E60D52.yl4.eu-north-1.eks.amazonaws.com 443 port [tcp/https] succeeded!
Check the status of the kubelet
1. Use SSH to connect to the affected worker node.
2. To check the kubelet logs, run the following command:
$ journalctl -u kubelet > kubelet.log
Note: The kubelet.log file contains information on kubelet operations that can help you find the root cause of the node status issue.
If the logs don't provide information on the source of the issue, then run the following command to check the status of the kubelet on the worker node:
$ sudo systemctl status kubelet kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-eksclt.al2.conf Active: inactive (dead) since Wed 2019-12-04 08:57:33 UTC; 40s ago
If the kubelet isn't in the Running status, then run the following command to restart the kubelet:
$ sudo systemctl restart kubelet
Check that the Amazon EC2 API endpoint is reachable
1. Use SSH to connect to one of the worker nodes.
2. To check if the Amazon Elastic Compute Cloud (Amazon EC2) API endpoint for your AWS Region is reachable, run the following command:
$ nc -vz ec2.<region>.amazonaws.com 443 Connection to ec2.us-east-1.amazonaws.com 443 port [tcp/https] succeeded!
Note: You must replace us-east-1 with the AWS Region where your worker node is located.