Fix Kubernetes Pod OOMKilled - Memory Troubleshooting Guide

Your pod was running fine, then suddenly disappeared. When you check the status, you see OOMKilled - the container was terminated because it used more memory than allowed. This is one of the most common causes of unexpected pod restarts in Kubernetes, and fixing it requires understanding both your application's memory behavior and Kubernetes resource management.

Understanding OOMKilled

When a container exceeds its memory limit, the Linux kernel's OOM (Out of Memory) killer terminates the process. Kubernetes reports this as OOMKilled with exit code 137. The pod may restart automatically if configured with a restart policy, leading to crash loops.

The tricky part is that OOMKilled can happen even when you think you have plenty of memory. Applications can spike during certain operations, memory leaks develop over time, or the JVM doesn't respect container limits properly.

Diagnosis Commands

Start by identifying the pod and examining its status:

```bash # Find pods that have been OOMKilled kubectl get pods -n namespace --field-selector=status.phase=Failed

# Check pod status and previous container state kubectl get pod pod-name -n namespace -o jsonpath='{.status.containerStatuses[0].lastState.terminated.reason}' kubectl get pod pod-name -n namespace -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'

# Get detailed pod information kubectl describe pod pod-name -n namespace

# Check current memory usage kubectl top pods -n namespace kubectl top pod pod-name -n namespace --containers ```

Look for the OOMKilled indicator:

bash

# Check for OOMKilled in events
kubectl describe pod pod-name -n namespace | grep -A 5 OOMKilled
kubectl describe pod pod-name -n namespace | grep -A 5 "Exit Code: 137"

Check the resource limits:

bash

# View pod resource limits
kubectl get pod pod-name -n namespace -o jsonpath='{.spec.containers[*].resources}'
kubectl get pod pod-name -n namespace -o yaml | grep -A 10 resources

Common Solutions

Solution 1: Increase Memory Limits

The most direct fix is to increase the memory limit:

```yaml # Before - insufficient memory resources: requests: memory: "128Mi" limits: memory: "256Mi" # Pod keeps getting OOMKilled

# After - increased memory resources: requests: memory: "256Mi" limits: memory: "512Mi" # Increased to accommodate memory spikes ```

But how do you know how much memory to allocate? Check historical memory usage:

```bash # If you have metrics server kubectl top pods -n namespace --sort-by=memory

# Check Prometheus metrics for memory history (if available) # Query: container_memory_working_set_bytes{pod="pod-name"} ```

Solution 2: Fix JVM Applications Not Respecting Limits

Java applications are notorious for OOMKilled issues because the JVM doesn't automatically detect container memory limits:

bash

# Check if JVM is ignoring container limits
kubectl logs pod-name -n namespace | grep -i "heap"

Fix by setting JVM heap size explicitly:

yaml

env:
  - name: JAVA_OPTS
    value: "-Xms256m -Xmx384m"  # Set max heap to ~75% of container limit
  # Or use Java 10+ container-aware flags
  - name: JAVA_OPTS
    value: "-XX:MaxRAMPercentage=75.0"

For JVM-based applications, the heap should be about 75% of the container memory limit. The remaining 25% is for JVM overhead, native memory, and thread stacks.

yaml

# Example: 512Mi container limit
resources:
  limits:
    memory: "512Mi"
env:
  - name: JAVA_OPTS
    value: "-XX:MaxRAMPercentage=75.0"  # JVM will use ~384Mi for heap

Solution 3: Fix Memory Leaks

Sometimes the application has a memory leak that causes memory usage to grow over time:

```bash # Monitor memory growth over time watch -n 5 'kubectl top pod pod-name -n namespace'

# Get heap dump for Java applications (before OOM) kubectl exec -it pod-name -n namespace -- jmap -dump:format=b,file=/tmp/heap.hprof 1

# Copy heap dump from pod kubectl cp namespace/pod-name:/tmp/heap.hprof ./heap.hprof ```

For Node.js applications:

bash

# Generate heap snapshot
kubectl exec -it pod-name -n namespace -- node -e "require('v8').writeHeapSnapshot('/tmp/heapdump.heapsnapshot')"
kubectl cp namespace/pod-name:/tmp/heapdump.heapsnapshot ./heapdump.heapsnapshot

Solution 4: Set Appropriate Memory Requests

Memory requests affect scheduling but don't prevent OOMKilled. However, setting them correctly helps:

yaml

resources:
  requests:
    memory: "256Mi"  # What the pod typically uses
  limits:
    memory: "512Mi"  # What the pod can burst to

The ratio between requests and limits matters: - Guaranteed QoS: requests == limits (most stable) - Burstable QoS: requests < limits (can burst) - BestEffort QoS: no requests/limits (first to be evicted)

For critical workloads, use Guaranteed QoS:

yaml

resources:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Solution 5: Fix Application Configuration

Many applications have memory-related configuration that needs tuning:

```yaml # Node.js applications env: - name: NODE_OPTIONS value: "--max-old-space-size=384" # ~75% of container limit

# Python applications env: - name: PYTHONMALLOC value: "malloc" # Use system malloc for better container integration

# PostgreSQL in container env: - name: POSTGRES_SHARED_BUFFERS value: "128MB" # Should be ~25% of container memory ```

Solution 6: Handle Memory Spikes During Startup

Some applications spike in memory during initialization:

```yaml # Use init container to pre-warm initContainers: - name: memory-warmup image: your-image command: ['sh', '-c', 'your-warmup-command'] resources: limits: memory: "1Gi" # Higher limit during init requests: memory: "512Mi"

containers: - name: app resources: limits: memory: "512Mi" # Normal limit after startup ```

Or adjust startup probes to give more time:

yaml

startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 30  # More time for startup
  periodSeconds: 10

Solution 7: Use Vertical Pod Autoscaler (VPA)

VPA can automatically recommend and set appropriate memory:

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-deployment
  updatePolicy:
    updateMode: "Auto"  # Or "Recreate" for immediate updates

Check VPA recommendations:

bash

kubectl get vpa my-app-vpa -o yaml | grep -A 20 "recommendation"

Solution 8: Check for Eviction Threshold Pressure

Sometimes nodes are under memory pressure, causing evictions:

```bash # Check node memory pressure kubectl describe node node-name | grep -A 10 Conditions

# Check node allocatable kubectl describe node node-name | grep -A 5 Allocatable

# Check for memory pressure events kubectl get events -n namespace --field-selector reason=MemoryPressure ```

Verification

After applying fixes, verify the pod stays running:

```bash # Watch pod status kubectl get pod pod-name -n namespace -w

# Monitor memory usage kubectl top pod pod-name -n namespace --containers

# Check for OOMKilled in previous state kubectl get pod pod-name -n namespace -o jsonpath='{.status.containerStatuses[0].lastState}'

# Check events for new issues kubectl get events -n namespace --field-selector involvedObject.name=pod-name ```

OOMKilled Exit Codes Reference

Exit Code	Meaning	Action
137	OOMKilled (128 + 9 SIGKILL)	Increase memory limit or fix leak
1	Application error	Check application logs
139	Segmentation fault	Application bug or library issue
143	SIGTERM (graceful shutdown)	Normal termination

Prevention Best Practices

Monitor memory usage trends and set alerts at 80% of limit. Use VPA to automatically adjust resources. Profile applications under realistic load before deploying. For JVM applications, always use container-aware flags like -XX:MaxRAMPercentage. Test memory behavior during peak operations and scale accordingly. Implement health checks that detect memory pressure before OOM.

Memory issues in Kubernetes are rarely about having too little memory - they're usually about having misconfigured limits. The key is understanding your application's actual memory behavior through monitoring and then setting appropriate limits with room for spikes.

Fix Kubernetes Pod OOMKilled - Out of Memory Error

Understanding OOMKilled

Diagnosis Commands

Common Solutions

Solution 1: Increase Memory Limits

Solution 2: Fix JVM Applications Not Respecting Limits

Solution 3: Fix Memory Leaks

Solution 4: Set Appropriate Memory Requests

Solution 5: Fix Application Configuration

Solution 6: Handle Memory Spikes During Startup

Solution 7: Use Vertical Pod Autoscaler (VPA)

Solution 8: Check for Eviction Threshold Pressure

Verification

OOMKilled Exit Codes Reference

Prevention Best Practices

Share this guide

More Kubernetes Troubleshooting Guides

Kubernetes Service Mesh Proxy Error

Kubernetes CronJob Not Scheduling

Kubernetes Admission Webhook Denied

Kubernetes Pod Security Denied

Kubernetes Network Policy Blocking

Kubernetes Weave Net Cross-Host Connectivity