Introduction

Azure AKS cluster autoscaler not adding nodes when pool config or quota prevents scaling. This guide provides step-by-step diagnosis and resolution.

Symptoms

Typical error output:

bash
Warning: FailedScaleOut
Cluster autoscaler could not add node: quota exceeded for 'standardDSv5Family' in region 'eastus'
Nodepool 'agentpool' cannot scale beyond 10 nodes

Common Causes

  1. 1.Subscription quota limit exceeded for node VM family
  2. 2.Autoscaler max node count reached
  3. 3.Subnet IP address exhaustion
  4. 4.Nodepool configuration prevents scaling

Step-by-Step Fix

Step 1: Check Current State

bash
az aks show --resource-group MyRG --name MyAKS --query agentPoolProfiles
az aks command invoke --resource-group MyRG --name MyAKS --command "kubectl get nodes"
kubectl describe nodes | grep -A5 "Capacity"

Step 2: Identify Root Cause

bash
az monitor activity-log list --resource-group MyRG --status Failed

Step 3: Apply Primary Fix

```bash # Update nodepool autoscaler limits az aks nodepool update --resource-group MyRG --cluster-name MyAKS --name agentpool --min-count 3 --max-count 20

# Request quota increase for VM family az quota request create --scope /subscriptions/<sub-id> --resource-name standardDSv5Family --limit 50 ```

Step 4: Apply Alternative Fix

```bash # Alternative fix: Check configuration az resource show --resource-group MyRG --name MyResource -o yaml

# Update specific properties az resource update --resource-group MyRG --name MyResource --set properties.<key>=<value>

# Verify the fix az resource show --resource-group MyRG --name MyResource --query properties.<key> ```

Step 5: Verify the Fix

bash
az aks show --resource-group MyRG --name MyAKS --query agentPoolProfiles[0].count
kubectl get nodes | wc -l

Common Pitfalls

  • Forgetting to check quota limits before resize operations
  • Not waiting for async operations to complete before next step
  • Missing RBAC permissions for Azure resource operations
  • Confusing subscription-level and resource-level quotas

Best Practices

  • Always check quota before provisioning new resources
  • Use Azure Resource Health for monitoring
  • Implement proper error handling in Azure CLI scripts
  • Enable diagnostic settings for all critical resources
  • Azure Subscription Quota Exceeded
  • Azure Resource Deployment Failed
  • Azure Network Connectivity Issues
  • Azure RBAC Permission Denied