Your pod was running fine, then suddenly disappeared. When you check the status, you see OOMKilled - the container was terminated because it used more memory than allowed. This is one of the most common causes of unexpected pod restarts in Kubernetes, and fixing it requires understanding both your application's memory behavior and Kubernetes resource management.
Understanding OOMKilled
When a container exceeds its memory limit, the Linux kernel's OOM (Out of Memory) killer terminates the process. Kubernetes reports this as OOMKilled with exit code 137. The pod may restart automatically if configured with a restart policy, leading to crash loops.
The tricky part is that OOMKilled can happen even when you think you have plenty of memory. Applications can spike during certain operations, memory leaks develop over time, or the JVM doesn't respect container limits properly.
Diagnosis Commands
Start by identifying the pod and examining its status:
```bash # Find pods that have been OOMKilled kubectl get pods -n namespace --field-selector=status.phase=Failed
# Check pod status and previous container state kubectl get pod pod-name -n namespace -o jsonpath='{.status.containerStatuses[0].lastState.terminated.reason}' kubectl get pod pod-name -n namespace -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'
# Get detailed pod information kubectl describe pod pod-name -n namespace
# Check current memory usage kubectl top pods -n namespace kubectl top pod pod-name -n namespace --containers ```
Look for the OOMKilled indicator:
# Check for OOMKilled in events
kubectl describe pod pod-name -n namespace | grep -A 5 OOMKilled
kubectl describe pod pod-name -n namespace | grep -A 5 "Exit Code: 137"Check the resource limits:
# View pod resource limits
kubectl get pod pod-name -n namespace -o jsonpath='{.spec.containers[*].resources}'
kubectl get pod pod-name -n namespace -o yaml | grep -A 10 resourcesCommon Solutions
Solution 1: Increase Memory Limits
The most direct fix is to increase the memory limit:
```yaml # Before - insufficient memory resources: requests: memory: "128Mi" limits: memory: "256Mi" # Pod keeps getting OOMKilled
# After - increased memory resources: requests: memory: "256Mi" limits: memory: "512Mi" # Increased to accommodate memory spikes ```
But how do you know how much memory to allocate? Check historical memory usage:
```bash # If you have metrics server kubectl top pods -n namespace --sort-by=memory
# Check Prometheus metrics for memory history (if available) # Query: container_memory_working_set_bytes{pod="pod-name"} ```
Solution 2: Fix JVM Applications Not Respecting Limits
Java applications are notorious for OOMKilled issues because the JVM doesn't automatically detect container memory limits:
# Check if JVM is ignoring container limits
kubectl logs pod-name -n namespace | grep -i "heap"Fix by setting JVM heap size explicitly:
env:
- name: JAVA_OPTS
value: "-Xms256m -Xmx384m" # Set max heap to ~75% of container limit
# Or use Java 10+ container-aware flags
- name: JAVA_OPTS
value: "-XX:MaxRAMPercentage=75.0"For JVM-based applications, the heap should be about 75% of the container memory limit. The remaining 25% is for JVM overhead, native memory, and thread stacks.
# Example: 512Mi container limit
resources:
limits:
memory: "512Mi"
env:
- name: JAVA_OPTS
value: "-XX:MaxRAMPercentage=75.0" # JVM will use ~384Mi for heapSolution 3: Fix Memory Leaks
Sometimes the application has a memory leak that causes memory usage to grow over time:
```bash # Monitor memory growth over time watch -n 5 'kubectl top pod pod-name -n namespace'
# Get heap dump for Java applications (before OOM) kubectl exec -it pod-name -n namespace -- jmap -dump:format=b,file=/tmp/heap.hprof 1
# Copy heap dump from pod kubectl cp namespace/pod-name:/tmp/heap.hprof ./heap.hprof ```
For Node.js applications:
# Generate heap snapshot
kubectl exec -it pod-name -n namespace -- node -e "require('v8').writeHeapSnapshot('/tmp/heapdump.heapsnapshot')"
kubectl cp namespace/pod-name:/tmp/heapdump.heapsnapshot ./heapdump.heapsnapshotSolution 4: Set Appropriate Memory Requests
Memory requests affect scheduling but don't prevent OOMKilled. However, setting them correctly helps:
resources:
requests:
memory: "256Mi" # What the pod typically uses
limits:
memory: "512Mi" # What the pod can burst toThe ratio between requests and limits matters: - Guaranteed QoS: requests == limits (most stable) - Burstable QoS: requests < limits (can burst) - BestEffort QoS: no requests/limits (first to be evicted)
For critical workloads, use Guaranteed QoS:
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "500m"Solution 5: Fix Application Configuration
Many applications have memory-related configuration that needs tuning:
```yaml # Node.js applications env: - name: NODE_OPTIONS value: "--max-old-space-size=384" # ~75% of container limit
# Python applications env: - name: PYTHONMALLOC value: "malloc" # Use system malloc for better container integration
# PostgreSQL in container env: - name: POSTGRES_SHARED_BUFFERS value: "128MB" # Should be ~25% of container memory ```
Solution 6: Handle Memory Spikes During Startup
Some applications spike in memory during initialization:
```yaml # Use init container to pre-warm initContainers: - name: memory-warmup image: your-image command: ['sh', '-c', 'your-warmup-command'] resources: limits: memory: "1Gi" # Higher limit during init requests: memory: "512Mi"
containers: - name: app resources: limits: memory: "512Mi" # Normal limit after startup ```
Or adjust startup probes to give more time:
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30 # More time for startup
periodSeconds: 10Solution 7: Use Vertical Pod Autoscaler (VPA)
VPA can automatically recommend and set appropriate memory:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-deployment
updatePolicy:
updateMode: "Auto" # Or "Recreate" for immediate updatesCheck VPA recommendations:
kubectl get vpa my-app-vpa -o yaml | grep -A 20 "recommendation"Solution 8: Check for Eviction Threshold Pressure
Sometimes nodes are under memory pressure, causing evictions:
```bash # Check node memory pressure kubectl describe node node-name | grep -A 10 Conditions
# Check node allocatable kubectl describe node node-name | grep -A 5 Allocatable
# Check for memory pressure events kubectl get events -n namespace --field-selector reason=MemoryPressure ```
Verification
After applying fixes, verify the pod stays running:
```bash # Watch pod status kubectl get pod pod-name -n namespace -w
# Monitor memory usage kubectl top pod pod-name -n namespace --containers
# Check for OOMKilled in previous state kubectl get pod pod-name -n namespace -o jsonpath='{.status.containerStatuses[0].lastState}'
# Check events for new issues kubectl get events -n namespace --field-selector involvedObject.name=pod-name ```
OOMKilled Exit Codes Reference
| Exit Code | Meaning | Action |
|---|---|---|
| 137 | OOMKilled (128 + 9 SIGKILL) | Increase memory limit or fix leak |
| 1 | Application error | Check application logs |
| 139 | Segmentation fault | Application bug or library issue |
| 143 | SIGTERM (graceful shutdown) | Normal termination |
Prevention Best Practices
Monitor memory usage trends and set alerts at 80% of limit. Use VPA to automatically adjust resources. Profile applications under realistic load before deploying. For JVM applications, always use container-aware flags like -XX:MaxRAMPercentage. Test memory behavior during peak operations and scale accordingly. Implement health checks that detect memory pressure before OOM.
Memory issues in Kubernetes are rarely about having too little memory - they're usually about having misconfigured limits. The key is understanding your application's actual memory behavior through monitoring and then setting appropriate limits with room for spikes.