Introduction GKE pods stuck in Pending state with "Insufficient cpu" or "Insufficient memory" cannot be scheduled because no node has enough allocatable resources.

Symptoms - `kubectl get pods` shows STATUS = Pending - `kubectl describe pod` shows: "0/X nodes are available: X Insufficient cpu" - Cluster node CPU/memory utilization near 100% - Cluster Autoscaler events show scaling attempts but new nodes not appearing

Common Causes - Node pool at maximum size (autoscaler max-nodes reached) - Resource requests too high for available node capacity - Pod anti-affinity rules preventing co-location - GPU node pool exhausted - ResourceQuota or LimitRange blocking new pods

Step-by-Step Fix 1. **Check scheduling events**: ```bash kubectl describe pod <pod-name> -n <namespace> ```

  1. 1.Check node pool status:
  2. 2.```bash
  3. 3.gcloud container node-pools describe <pool-name> --cluster <cluster-name> --zone <zone> \
  4. 4.--format="value(autoscaling,initialNodeCount,config.machineType)"
  5. 5.`
  6. 6.Increase node pool max size:
  7. 7.```bash
  8. 8.gcloud container clusters update <cluster-name> --zone <zone> \
  9. 9.--enable-autoscaling --node-pool <pool-name> --min-nodes 1 --max-nodes 20
  10. 10.`
  11. 11.Add a new node pool:
  12. 12.```bash
  13. 13.gcloud container node-pools create high-cpu-pool --cluster <cluster-name> --zone <zone> \
  14. 14.--machine-type c2-standard-8 --num-nodes 2 --enable-autoscaling --min-nodes 1 --max-nodes 10
  15. 15.`
  16. 16.Optimize resource requests:
  17. 17.```bash
  18. 18.kubectl top pods -n <namespace>
  19. 19.kubectl get deployment <name> -n <namespace> -o jsonpath='{.spec.template.spec.containers[0].resources}'
  20. 20.`

Prevention - Set appropriate resource requests based on actual usage - Use Cluster Autoscaler with appropriate min/max bounds - Monitor pending pod count with Cloud Monitoring alert - Use GKE Autopilot for automatic resource management - Set up ResourceQuotas per namespace