Introduction
Memory-based HPA scaling is less forgiving than CPU scaling because it depends on both working metrics and meaningful resource requests. If the cluster cannot provide memory metrics, or the Pods never declared memory requests, the HPA cannot calculate utilization and scaling simply does not happen. Even when metrics are present, memory may be the wrong signal if the workload allocates large resident sets but does not benefit from replica growth.
Symptoms
- The HPA shows unknown or stale memory utilization
- Pods do not scale even though memory consumption is clearly high
kubectl top podsworks inconsistently or not at all- The HPA definition looks correct, but memory-based scaling never triggers
Common Causes
- Metrics for memory are missing or delayed
- Pods do not define
resources.requests.memory - The HPA references memory utilization, but the workload’s memory pattern does not scale linearly with replicas
- Cluster quota or max replica settings prevent visible scaling
Step-by-Step Fix
- 1.Confirm the HPA is receiving memory metrics
- 2.If the metrics path is broken, the HPA cannot make decisions regardless of the target threshold.
kubectl get hpa my-hpa -o wide
kubectl describe hpa my-hpa
kubectl top pods- 1.Verify the workload defines memory requests
- 2.HPA utilization percentages require requests as the baseline.
resources:
requests:
memory: 256Mi- 1.Recheck whether memory is the right scaling signal
- 2.Some applications hold memory for caches or JVM heap in ways that do not map well to autoscaling decisions.
- 3.Validate cluster limits and replica ceilings
- 4.Even correct HPA logic cannot scale beyond policy or quota boundaries.
Prevention
- Always define memory requests on workloads that use memory-based HPA
- Keep metrics-server and resource metrics healthy cluster-wide
- Choose autoscaling signals that match the workload’s real bottlenecks
- Review HPA max replicas and quota constraints during scaling incidents