Introduction A Kubernetes node in NotReady state cannot schedule new pods, and existing pods may be evicted. This reduces cluster capacity and can trigger cascading rescheduling across the cluster.

Symptoms - `kubectl get nodes` shows STATUS = NotReady - Node conditions show: "MemoryPressure", "DiskPressure", or "PIDPressure" - Pods on the node being evicted or terminated - Events show: "Node <name> status is now: NodeNotReady" - kube-scheduler cannot place pods on the node

Common Causes - kubelet process crashed or not communicating with API server - Disk pressure: node root partition or image filesystem full - Memory pressure: node OOM conditions - Network plugin (CNI) not running - Certificate expiration (kubelet-client, kubelet-server)

Step-by-Step Fix 1. **Check node conditions**: ```bash kubectl describe node <node-name> | grep -A5 "Conditions:" ```

  1. 1.SSH to the node and check kubelet:
  2. 2.```bash
  3. 3.ssh <node-ip>
  4. 4.sudo systemctl status kubelet
  5. 5.sudo journalctl -u kubelet --since "30 minutes ago" | tail -50
  6. 6.`
  7. 7.Restart kubelet:
  8. 8.```bash
  9. 9.sudo systemctl restart kubelet
  10. 10.# Verify it reconnects
  11. 11.kubectl get nodes | grep <node-name>
  12. 12.`
  13. 13.Check disk space and clean up:
  14. 14.```bash
  15. 15.df -h
  16. 16.# Clean up unused container images
  17. 17.sudo crictl rmi --prune
  18. 18.# Or if using Docker
  19. 19.sudo docker system prune -af
  20. 20.`
  21. 21.Check CNI plugin:
  22. 22.```bash
  23. 23.sudo systemctl status kubelet
  24. 24.sudo crictl pods
  25. 25.# Check CNI config
  26. 26.ls /etc/cni/net.d/
  27. 27.`

Prevention - Monitor node conditions with Prometheus node exporter - Set up alerts for NotReady nodes - Configure image GC thresholds (imageGCHighThresholdPercent: 70) - Use Node Problem Detector for automatic issue reporting - Regularly rotate kubelet certificates - Maintain at least 20% free disk space on nodes