Introduction
GCP GKE cluster autoscaler not adding nodes when quota or resource policy prevents scaling. This guide provides step-by-step diagnosis and resolution.
Symptoms
Typical error output:
Warning: autoscaler: FailedScaleOut
Cannot add node: quota 'CPUS' exceeded in project 'my-project'
Node pool "default-pool" cannot scale beyond 10 nodesCommon Causes
- 1.Project quota limit for CPUs exceeded
- 2.Node pool max nodes configuration reached
- 3.Subnet IP address exhaustion
- 4.Regional availability insufficient
Step-by-Step Fix
Step 1: Check Current State
gcloud container clusters describe my-cluster --region=us-central1
kubectl get nodes
kubectl describe nodes | grep -A5 "Capacity"Step 2: Identify Root Cause
gcloud logging read --project=<project> --filter="severity>=ERROR"Step 3: Apply Primary Fix
```bash # Update node pool autoscaling limits gcloud container clusters update my-cluster --region=us-central1 --node-pool=default-pool --min-nodes=3 --max-nodes=20
# Request quota increase gcloud compute resource-quotas request --region=us-central1 --limit=50 --resource=CPUS ```
Step 4: Apply Alternative Fix
```bash # Alternative fix: Check configuration gcloud resource describe <resource> --project=<project> --format=yaml
# Update specific properties gcloud resource update <resource> --project=<project> --<flag>=<value>
# Verify the fix gcloud resource describe <resource> --project=<project> --format=json ```
Step 5: Verify the Fix
gcloud container clusters describe my-cluster --region=us-central1 --format="value(nodePools[0].autoscaling.maxNodeCount)"
kubectl get nodesCommon Pitfalls
- Forgetting to check regional quotas before provisioning
- Not waiting for async operations to complete before next step
- Missing IAM permissions for GCP resource operations
- Confusing zone-level and region-level resources
Best Practices
- Always check quotas before provisioning new resources
- Use GCP Cloud Monitoring for observability
- Implement proper error handling in gcloud scripts
- Enable logging for all critical GCP resources
Related Issues
- GCP Quota Exceeded
- GCP Resource Deployment Failed
- GCP Network Connectivity Issues
- GCP IAM Permission Denied