Fix Kubernetes HPA Scaling Too Aggressively

What's Actually Happening

Kubernetes Horizontal Pod Autoscaler (HPA) scales up pods too quickly, creating many pods for minor load increases. This causes resource exhaustion, cost spikes, and unnecessary pod churn. Pods scale up then immediately scale down.

The Error You'll See

Rapid pod scaling:

```bash $ kubectl get hpa my-app -w

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE my-app Deployment/app 60%/50% 2 20 5 1m my-app Deployment/app 60%/50% 2 20 10 2m # Doubled! my-app Deployment/app 60%/50% 2 20 20 3m # Max!

$ kubectl get pods -l app=my-app | wc -l 20 # Maximum pods reached quickly ```

Scaling events:

```bash $ kubectl describe hpa my-app

Events: Type Reason Age From Message ---- ------ --- ---- ------- Normal SuccessfulRescale 1m horizontal-pod-autoscaler New size: 5 Normal SuccessfulRescale 30s horizontal-pod-autoscaler New size: 10 Normal SuccessfulRescale 10s horizontal-pod-autoscaler New size: 20 ```

Scale down immediately:

```bash $ kubectl get hpa my-app

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS my-app Deployment/app 40%/50% 2 20 20 # High replicas for low load ```

Why This Happens

1.Aggressive scaling policy - Default adds up to 100% pods at once
2.No cooldown - Scaling up immediately after previous scale
3.Metric spikes - Temporary load spikes trigger scaling
4.Threshold too close - Target too close to current value
5.Stabilization window missing - No delay for decisions
6.No scaling limits - Can double replicas in one step

Step 1: Check Current HPA Configuration

```bash # Get HPA details kubectl get hpa my-app -o yaml

# Check scaling behavior (Kubernetes 1.23+) kubectl get hpa my-app -o jsonpath='{.spec.behavior}'

# Check metrics and targets kubectl get hpa my-app -o jsonpath='{.spec.metrics}'

# Check min/max replicas kubectl get hpa my-app -o jsonpath='{.spec.minReplicas}-{.spec.maxReplicas}'

# Describe HPA for events kubectl describe hpa my-app

# Check current replicas kubectl get hpa my-app -o jsonpath='{.status.currentReplicas}' ```

Step 2: Add Stabilization Window

```yaml # Add stabilization window to prevent rapid scaling

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-app spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-app minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 behavior: scaleDown: stabilizationWindowSeconds: 300 # Wait 5 minutes before scale down policies: - type: Percent value: 10 periodSeconds: 60 scaleUp: stabilizationWindowSeconds: 60 # Wait 1 minute before scale up policies: - type: Percent value: 100 periodSeconds: 15

# Apply kubectl apply -f hpa.yaml

# Verify kubectl get hpa my-app -o yaml | grep stabilizationWindowSeconds ```

Step 3: Limit Scale Up Rate

```yaml # Limit how many pods can be added at once

behavior: scaleUp: policies: # Option 1: Add max 4 pods per minute - type: Pods value: 4 periodSeconds: 60 selectPolicy: Max # Use the most aggressive policy

# Option 2: Add max 50% of current pods per minute - type: Percent value: 50 periodSeconds: 60

# Combined: Use max of 4 pods OR 50% (most restrictive) selectPolicy: Min # Use least aggressive policy

# Option 3: Add max 25% per 2 minutes - type: Percent value: 25 periodSeconds: 120

# Example: If 10 pods, 25% scale up = add 2-3 pods, not 10 # This prevents doubling replicas in one step ```

Step 4: Limit Scale Down Rate

```yaml # Prevent rapid scale down that causes thrashing

behavior: scaleDown: stabilizationWindowSeconds: 300 # 5 minute cooldown

policies: # Option 1: Remove max 1 pod per minute - type: Pods value: 1 periodSeconds: 60

# Option 2: Remove max 10% per minute - type: Percent value: 10 periodSeconds: 60

# Use most restrictive selectPolicy: Min

# Prevent removing all pods at once # Gradual scale down allows monitoring effectiveness

# Apply kubectl apply -f hpa.yaml

# Verify behavior kubectl get hpa my-app -o jsonpath='{.spec.behavior.scaleDown}' ```

Step 5: Adjust Target Threshold

```bash # Check current target kubectl get hpa my-app -o jsonpath='{.spec.metrics[0].resource.target.averageUtilization}'

# If threshold is too close to typical usage, increase gap

# Current: 50% target, but app runs at 45-55% normally # This causes constant scaling decisions

# Increase threshold to 60% or 70% apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-app spec: metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 # Increased from 50

# Apply kubectl patch hpa my-app --type merge -p '{"spec":{"metrics":[{"type":"Resource","resource":{"name":"cpu","target":{"type":"Utilization","averageUtilization":70}}}]}}'

# Check new target kubectl get hpa my-app ```

Step 6: Use Multiple Metrics

```yaml # Use CPU and memory to make scaling decisions

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-app spec: metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80

# HPA scales based on the metric that needs most scaling # This prevents scaling on CPU spike when memory is fine

# Or use custom metrics (more accurate): metrics: - type: Pods pods: metric: name: requests_per_second target: type: AverageValue averageValue: 100

# Or external metrics: - type: External external: metric: name: queue_messages target: type: AverageValue averageValue: 10 ```

Step 7: Fix Metric Collection Issues

```bash # HPA needs metrics from Metrics Server

# Check metrics server installed kubectl get deployment metrics-server -n kube-system

# Check metrics server pods kubectl get pods -n kube-system -l k8s-app=metrics-server

# Verify metrics available kubectl top pods kubectl top nodes

# If metrics not available: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Check metrics server logs kubectl logs -n kube-system deployment/metrics-server

# For custom metrics, check Prometheus Adapter kubectl get pods -n monitoring -l app=prometheus-adapter ```

Step 8: Monitor HPA Scaling Events

```bash # Watch HPA events kubectl get events --field-selector involvedObject.kind=HorizontalPodAutoscaler -w

# Check scaling history kubectl describe hpa my-app | grep -A 20 Events

# Create monitoring script cat << 'EOF' > monitor_hpa.sh #!/bin/bash HPA=my-app

echo "=== HPA Status ===" kubectl get hpa $HPA

echo "" echo "=== Current Metrics ===" kubectl get hpa $HPA -o jsonpath='{.status.currentMetrics}'

echo "" echo "=== Scaling Events (Last 10) ===" kubectl get events --field-selector involvedObject.name=$HPA | tail -10

echo "" echo "=== Pod Count ===" kubectl get pods -l app=my-app | grep Running | wc -l EOF

chmod +x monitor_hpa.sh

# Prometheus query for HPA metrics # kube_hpa_status_current_replicas # kube_hpa_status_desired_replicas # kube_hpa_spec_target_metric ```

Step 9: Disable HPA Temporarily for Debug

```bash # If HPA causing too much churn, disable temporarily

# Set min and max to same value (stops scaling) kubectl patch hpa my-app -p '{"spec":{"minReplicas":5,"maxReplicas":5}}'

# Or delete HPA kubectl delete hpa my-app

# Monitor manually kubectl get pods -l app=my-app -w

# Recreate HPA with better configuration kubectl apply -f hpa.yaml

# Or suspend by setting very wide stabilization kubectl patch hpa my-app --type merge -p '{"spec":{"behavior":{"scaleUp":{"stabilizationWindowSeconds":3600}}}}' ```

Step 10: Set Pod Disruption Budget

```yaml # Prevent too many pods being disrupted

apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: my-app-pdb spec: minAvailable: 2 # Always keep 2 pods running selector: matchLabels: app: my-app

# Or use maxUnavailable: spec: maxUnavailable: 1 # Allow only 1 pod disruption at a time selector: matchLabels: app: my-app

# Apply kubectl apply -f pdb.yaml

# Verify kubectl get pdb my-app-pdb

# This prevents aggressive scale down during voluntary disruptions ```

Kubernetes HPA Scaling Settings Reference

Setting	Default	Purpose
stabilizationWindowSeconds (scaleUp)	0	Delay before scaling up
stabilizationWindowSeconds (scaleDown)	300 (5m)	Delay before scaling down
scaleUp policy Percent	100	Max % increase per period
scaleUp policy Pods	4	Max pods to add per period
scaleDown policy Percent	100	Max % decrease per period
scaleDown policy Pods	unlimited	Max pods to remove per period

Verify the Fix

```bash # After adjusting HPA scaling behavior

# 1. Check HPA configuration kubectl get hpa my-app -o yaml | grep -A 20 behavior // Shows stabilization windows and policies

# 2. Monitor scaling events kubectl get events --field-selector involvedObject.name=my-app -w // Scaling events should be spaced out

# 3. Watch replica count kubectl get hpa my-app -w // Changes should be gradual, not doubling

# 4. Check pod churn kubectl get pods -l app=my-app -w // Pods shouldn't be created/deleted rapidly

# 5. Verify stabilization kubectl describe hpa my-app | grep -i stabilization // Should show stabilization delay

# 6. Monitor actual metrics kubectl top pods -l app=my-app // Should match HPA decisions ```

[Fix Kubernetes HPA Not Scaling](/articles/fix-kubernetes-hpa-not-scaling)
[Fix Kubernetes HPA Custom Metric Not Scaling](/articles/fix-kubernetes-hpa-custom-metric-not-scaling)
[Fix Kubernetes Deployment Not Rolling Out](/articles/fix-kubernetes-deployment-not-rolling-out)

What's Actually Happening

The Error You'll See

Why This Happens

Step 1: Check Current HPA Configuration

Step 2: Add Stabilization Window

Step 3: Limit Scale Up Rate

Step 4: Limit Scale Down Rate

Step 5: Adjust Target Threshold

Step 6: Use Multiple Metrics

Step 7: Fix Metric Collection Issues

Step 8: Monitor HPA Scaling Events

Step 9: Disable HPA Temporarily for Debug

Step 10: Set Pod Disruption Budget

Kubernetes HPA Scaling Settings Reference

Verify the Fix

Related Issues

Share this guide

More Kubernetes Troubleshooting Guides

Kubernetes Service Mesh Proxy Error

Kubernetes CronJob Not Scheduling

Kubernetes Admission Webhook Denied

Kubernetes Pod Security Denied

Kubernetes Network Policy Blocking

Kubernetes Weave Net Cross-Host Connectivity