Introduction

GCP GKE cluster autoscaler not adding nodes when quota or resource policy prevents scaling. This guide provides step-by-step diagnosis and resolution.

Symptoms

Typical error output:

bash
Warning: autoscaler: FailedScaleOut
Cannot add node: quota 'CPUS' exceeded in project 'my-project'
Node pool "default-pool" cannot scale beyond 10 nodes

Common Causes

  1. 1.Project quota limit for CPUs exceeded
  2. 2.Node pool max nodes configuration reached
  3. 3.Subnet IP address exhaustion
  4. 4.Regional availability insufficient

Step-by-Step Fix

Step 1: Check Current State

bash
gcloud container clusters describe my-cluster --region=us-central1
kubectl get nodes
kubectl describe nodes | grep -A5 "Capacity"

Step 2: Identify Root Cause

bash
gcloud logging read --project=<project> --filter="severity>=ERROR"

Step 3: Apply Primary Fix

```bash # Update node pool autoscaling limits gcloud container clusters update my-cluster --region=us-central1 --node-pool=default-pool --min-nodes=3 --max-nodes=20

# Request quota increase gcloud compute resource-quotas request --region=us-central1 --limit=50 --resource=CPUS ```

Step 4: Apply Alternative Fix

```bash # Alternative fix: Check configuration gcloud resource describe <resource> --project=<project> --format=yaml

# Update specific properties gcloud resource update <resource> --project=<project> --<flag>=<value>

# Verify the fix gcloud resource describe <resource> --project=<project> --format=json ```

Step 5: Verify the Fix

bash
gcloud container clusters describe my-cluster --region=us-central1 --format="value(nodePools[0].autoscaling.maxNodeCount)"
kubectl get nodes

Common Pitfalls

  • Forgetting to check regional quotas before provisioning
  • Not waiting for async operations to complete before next step
  • Missing IAM permissions for GCP resource operations
  • Confusing zone-level and region-level resources

Best Practices

  • Always check quotas before provisioning new resources
  • Use GCP Cloud Monitoring for observability
  • Implement proper error handling in gcloud scripts
  • Enable logging for all critical GCP resources
  • GCP Quota Exceeded
  • GCP Resource Deployment Failed
  • GCP Network Connectivity Issues
  • GCP IAM Permission Denied