What's Actually Happening
Kubernetes Horizontal Pod Autoscaler (HPA) scales up pods too quickly, creating many pods for minor load increases. This causes resource exhaustion, cost spikes, and unnecessary pod churn. Pods scale up then immediately scale down.
The Error You'll See
Rapid pod scaling:
```bash $ kubectl get hpa my-app -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE my-app Deployment/app 60%/50% 2 20 5 1m my-app Deployment/app 60%/50% 2 20 10 2m # Doubled! my-app Deployment/app 60%/50% 2 20 20 3m # Max!
$ kubectl get pods -l app=my-app | wc -l 20 # Maximum pods reached quickly ```
Scaling events:
```bash $ kubectl describe hpa my-app
Events: Type Reason Age From Message ---- ------ --- ---- ------- Normal SuccessfulRescale 1m horizontal-pod-autoscaler New size: 5 Normal SuccessfulRescale 30s horizontal-pod-autoscaler New size: 10 Normal SuccessfulRescale 10s horizontal-pod-autoscaler New size: 20 ```
Scale down immediately:
```bash $ kubectl get hpa my-app
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS my-app Deployment/app 40%/50% 2 20 20 # High replicas for low load ```
Why This Happens
- 1.Aggressive scaling policy - Default adds up to 100% pods at once
- 2.No cooldown - Scaling up immediately after previous scale
- 3.Metric spikes - Temporary load spikes trigger scaling
- 4.Threshold too close - Target too close to current value
- 5.Stabilization window missing - No delay for decisions
- 6.No scaling limits - Can double replicas in one step
Step 1: Check Current HPA Configuration
```bash # Get HPA details kubectl get hpa my-app -o yaml
# Check scaling behavior (Kubernetes 1.23+) kubectl get hpa my-app -o jsonpath='{.spec.behavior}'
# Check metrics and targets kubectl get hpa my-app -o jsonpath='{.spec.metrics}'
# Check min/max replicas kubectl get hpa my-app -o jsonpath='{.spec.minReplicas}-{.spec.maxReplicas}'
# Describe HPA for events kubectl describe hpa my-app
# Check current replicas kubectl get hpa my-app -o jsonpath='{.status.currentReplicas}' ```
Step 2: Add Stabilization Window
```yaml # Add stabilization window to prevent rapid scaling
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-app spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-app minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 behavior: scaleDown: stabilizationWindowSeconds: 300 # Wait 5 minutes before scale down policies: - type: Percent value: 10 periodSeconds: 60 scaleUp: stabilizationWindowSeconds: 60 # Wait 1 minute before scale up policies: - type: Percent value: 100 periodSeconds: 15
# Apply kubectl apply -f hpa.yaml
# Verify kubectl get hpa my-app -o yaml | grep stabilizationWindowSeconds ```
Step 3: Limit Scale Up Rate
```yaml # Limit how many pods can be added at once
behavior: scaleUp: policies: # Option 1: Add max 4 pods per minute - type: Pods value: 4 periodSeconds: 60 selectPolicy: Max # Use the most aggressive policy
# Option 2: Add max 50% of current pods per minute - type: Percent value: 50 periodSeconds: 60
# Combined: Use max of 4 pods OR 50% (most restrictive) selectPolicy: Min # Use least aggressive policy
# Option 3: Add max 25% per 2 minutes - type: Percent value: 25 periodSeconds: 120
# Example: If 10 pods, 25% scale up = add 2-3 pods, not 10 # This prevents doubling replicas in one step ```
Step 4: Limit Scale Down Rate
```yaml # Prevent rapid scale down that causes thrashing
behavior: scaleDown: stabilizationWindowSeconds: 300 # 5 minute cooldown
policies: # Option 1: Remove max 1 pod per minute - type: Pods value: 1 periodSeconds: 60
# Option 2: Remove max 10% per minute - type: Percent value: 10 periodSeconds: 60
# Use most restrictive selectPolicy: Min
# Prevent removing all pods at once # Gradual scale down allows monitoring effectiveness
# Apply kubectl apply -f hpa.yaml
# Verify behavior kubectl get hpa my-app -o jsonpath='{.spec.behavior.scaleDown}' ```
Step 5: Adjust Target Threshold
```bash # Check current target kubectl get hpa my-app -o jsonpath='{.spec.metrics[0].resource.target.averageUtilization}'
# If threshold is too close to typical usage, increase gap
# Current: 50% target, but app runs at 45-55% normally # This causes constant scaling decisions
# Increase threshold to 60% or 70% apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-app spec: metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 # Increased from 50
# Apply kubectl patch hpa my-app --type merge -p '{"spec":{"metrics":[{"type":"Resource","resource":{"name":"cpu","target":{"type":"Utilization","averageUtilization":70}}}]}}'
# Check new target kubectl get hpa my-app ```
Step 6: Use Multiple Metrics
```yaml # Use CPU and memory to make scaling decisions
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-app spec: metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80
# HPA scales based on the metric that needs most scaling # This prevents scaling on CPU spike when memory is fine
# Or use custom metrics (more accurate): metrics: - type: Pods pods: metric: name: requests_per_second target: type: AverageValue averageValue: 100
# Or external metrics: - type: External external: metric: name: queue_messages target: type: AverageValue averageValue: 10 ```
Step 7: Fix Metric Collection Issues
```bash # HPA needs metrics from Metrics Server
# Check metrics server installed kubectl get deployment metrics-server -n kube-system
# Check metrics server pods kubectl get pods -n kube-system -l k8s-app=metrics-server
# Verify metrics available kubectl top pods kubectl top nodes
# If metrics not available: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Check metrics server logs kubectl logs -n kube-system deployment/metrics-server
# For custom metrics, check Prometheus Adapter kubectl get pods -n monitoring -l app=prometheus-adapter ```
Step 8: Monitor HPA Scaling Events
```bash # Watch HPA events kubectl get events --field-selector involvedObject.kind=HorizontalPodAutoscaler -w
# Check scaling history kubectl describe hpa my-app | grep -A 20 Events
# Create monitoring script cat << 'EOF' > monitor_hpa.sh #!/bin/bash HPA=my-app
echo "=== HPA Status ===" kubectl get hpa $HPA
echo "" echo "=== Current Metrics ===" kubectl get hpa $HPA -o jsonpath='{.status.currentMetrics}'
echo "" echo "=== Scaling Events (Last 10) ===" kubectl get events --field-selector involvedObject.name=$HPA | tail -10
echo "" echo "=== Pod Count ===" kubectl get pods -l app=my-app | grep Running | wc -l EOF
chmod +x monitor_hpa.sh
# Prometheus query for HPA metrics # kube_hpa_status_current_replicas # kube_hpa_status_desired_replicas # kube_hpa_spec_target_metric ```
Step 9: Disable HPA Temporarily for Debug
```bash # If HPA causing too much churn, disable temporarily
# Set min and max to same value (stops scaling) kubectl patch hpa my-app -p '{"spec":{"minReplicas":5,"maxReplicas":5}}'
# Or delete HPA kubectl delete hpa my-app
# Monitor manually kubectl get pods -l app=my-app -w
# Recreate HPA with better configuration kubectl apply -f hpa.yaml
# Or suspend by setting very wide stabilization kubectl patch hpa my-app --type merge -p '{"spec":{"behavior":{"scaleUp":{"stabilizationWindowSeconds":3600}}}}' ```
Step 10: Set Pod Disruption Budget
```yaml # Prevent too many pods being disrupted
apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: my-app-pdb spec: minAvailable: 2 # Always keep 2 pods running selector: matchLabels: app: my-app
# Or use maxUnavailable: spec: maxUnavailable: 1 # Allow only 1 pod disruption at a time selector: matchLabels: app: my-app
# Apply kubectl apply -f pdb.yaml
# Verify kubectl get pdb my-app-pdb
# This prevents aggressive scale down during voluntary disruptions ```
Kubernetes HPA Scaling Settings Reference
| Setting | Default | Purpose |
|---|---|---|
| stabilizationWindowSeconds (scaleUp) | 0 | Delay before scaling up |
| stabilizationWindowSeconds (scaleDown) | 300 (5m) | Delay before scaling down |
| scaleUp policy Percent | 100 | Max % increase per period |
| scaleUp policy Pods | 4 | Max pods to add per period |
| scaleDown policy Percent | 100 | Max % decrease per period |
| scaleDown policy Pods | unlimited | Max pods to remove per period |
Verify the Fix
```bash # After adjusting HPA scaling behavior
# 1. Check HPA configuration kubectl get hpa my-app -o yaml | grep -A 20 behavior // Shows stabilization windows and policies
# 2. Monitor scaling events kubectl get events --field-selector involvedObject.name=my-app -w // Scaling events should be spaced out
# 3. Watch replica count kubectl get hpa my-app -w // Changes should be gradual, not doubling
# 4. Check pod churn kubectl get pods -l app=my-app -w // Pods shouldn't be created/deleted rapidly
# 5. Verify stabilization kubectl describe hpa my-app | grep -i stabilization // Should show stabilization delay
# 6. Monitor actual metrics kubectl top pods -l app=my-app // Should match HPA decisions ```
Related Issues
- [Fix Kubernetes HPA Not Scaling](/articles/fix-kubernetes-hpa-not-scaling)
- [Fix Kubernetes HPA Custom Metric Not Scaling](/articles/fix-kubernetes-hpa-custom-metric-not-scaling)
- [Fix Kubernetes Deployment Not Rolling Out](/articles/fix-kubernetes-deployment-not-rolling-out)