Introduction
When a Kubernetes node becomes NotReady, the control plane stops treating it as a healthy scheduling target. In many incidents, the underlying cause is kubelet failure or degraded node-level health rather than a problem in the Pods themselves. The right response is to inspect the kubelet, the node’s local resources, and its connectivity to the API server before trying to fix workloads on top of it.
Symptoms
kubectl get nodesshows the node asNotReady- New Pods stop scheduling to the node
- Existing Pods become
Unknown, are evicted, or stop receiving normal updates - Events point to
KubeletNotReady, disk pressure, or connectivity issues
Common Causes
- The kubelet service stopped or is crashing repeatedly
- Disk pressure or other local resource exhaustion degraded node health
- The node lost connectivity to the API server
- Kubelet configuration or node certificates became invalid
Step-by-Step Fix
- 1.Inspect node conditions and recent events
- 2.Start from the cluster view so you know whether the issue is readiness, pressure, or communication.
kubectl describe node my-node- 1.Check kubelet service health on the node
- 2.If kubelet is down or crashing, no higher-level Kubernetes debugging will help until the node agent is stable again.
sudo systemctl status kubelet
sudo journalctl -u kubelet --since "1 hour ago"- 1.Verify local disk and basic node health
- 2.Full disks and corrupted local state are common reasons kubelet degrades or stops.
df -h- 1.Restart kubelet only after checking why it failed
- 2.A restart may restore service temporarily, but you still need to understand whether the root cause is configuration, connectivity, or resource pressure.
sudo systemctl restart kubeletPrevention
- Monitor kubelet service health and node conditions directly
- Alert on disk pressure before nodes become NotReady
- Keep node bootstrap and certificate rotation procedures documented
- Treat repeated kubelet restarts as a real incident signal, not just noise