Introduction

Linux Out of Memory (OOM) killer errors occur when the system exhausts available memory and the kernel's OOM killer terminates processes to reclaim memory and prevent system crash. The OOM killer selects victim processes based on oom_score (memory consumption, process importance, runtime), killing the least critical processes first. While this prevents complete system lockup, it can cause service outages, data loss, and application failures. Common causes include application memory leaks consuming all available RAM, insufficient physical memory for workload, swap disabled or exhausted, cgroup memory limits too restrictive, memory fragmentation preventing large allocations, kernel memory leaks (kmemleak), too many processes consuming memory, database buffer pool oversized for system RAM, Java heap not properly configured, and container memory limits without proper monitoring. The fix requires understanding Linux memory management, OOM killer selection algorithm, cgroup memory accounting, swap configuration, and debugging tools. This guide provides production-proven troubleshooting for OOM killer issues across RHEL/CentOS, Ubuntu/Debian, SUSE, and containerized environments.

Symptoms

  • Process killed unexpectedly with "Killed" message
  • dmesg shows "Out of memory: Kill process"
  • Service restarts automatically (systemd restart on failure)
  • OOM killer invoked in system logs
  • Specific process repeatedly killed
  • System becomes unresponsive before kill
  • Container/pod killed with OOMKilled status
  • Memory usage at 100% before kill
  • Swap heavily used before OOM
  • Multiple processes killed in cascade

Common Causes

  • Application memory leak
  • Insufficient RAM for workload
  • Swap disabled or too small
  • Cgroup memory limit too low
  • Database buffer pool oversized
  • Java heap misconfiguration
  • Too many concurrent processes
  • Memory fragmentation
  • Kernel memory leak
  • Container without memory limits

Step-by-Step Fix

### 1. Diagnose OOM killer events

Check OOM killer logs:

```bash # View OOM killer messages dmesg | grep -i "out of memory" dmesg | grep -i "oom" dmesg | grep -i "killed process"

# Or via journalctl journalctl -k | grep -i "oom" journalctl -k | grep -i "out of memory"

# Typical OOM output: # Out of memory: Kill process 1234 (java) score 500 or sacrifice child # Killed process 1234 (java) total-vm:8000000kB, anon-rss:4000000kB # oom_reaper: reaped process 1234 (java), now anon-rss:0kB ```

Understand OOM output:

```bash # OOM killer output fields: # - score: OOM score (higher = more likely to be killed) # - total-vm: Total virtual memory allocated # - anon-rss: Anonymous resident set (actual RAM used) # - file-rss: File-backed memory (page cache) # - shmem-rss: Shared memory

# Check specific process OOM score cat /proc/1234/oom_score cat /proc/1234/oom_score_adj

# Higher oom_score_adj = more likely to be killed # Range: -1000 (never kill) to 1000 (always kill first) ```

Check memory status:

```bash # Overall memory status cat /proc/meminfo

# Key fields: # MemTotal: Total physical RAM # MemFree: Currently free memory # MemAvailable: Available for new allocations # Buffers: Kernel buffer cache # Cached: Page cache # SwapTotal: Total swap space # SwapFree: Free swap # SwapCached: Swap that can be freed

# Check memory over time vmstat 1 10

# Look at: # - swpd: Virtual memory used # - free: Free memory # - buff: Buffers # - cache: Page cache # - si/so: Swap in/out (high = memory pressure) ```

### 2. Identify memory-consuming processes

Find top memory consumers:

```bash # Top processes by memory ps aux --sort=-%mem | head -20

# Key columns: # %MEM: Percentage of physical RAM # VSZ: Virtual memory size # RSS: Resident set size (actual RAM)

# More detailed view ps -eo pid,ppid,cmd,%mem,rss,vsz --sort=-rss | head -20

# Using htop htop --sort-key PERCENT_MEM

# Show memory maps for specific process cat /proc/<PID>/status | grep -E "VmSize|VmRSS|VmData" ```

Check process memory details:

```bash # Detailed memory breakdown cat /proc/<PID>/smaps | head -50

# Or summarized cat /proc/<PID>/smaps_rollup

# Output shows: # Private pages (unique to process) # Shared pages (shared with other processes) # Heap, stack, shared libraries # Page tables

# Check for memory leaks watch -n 5 'cat /proc/<PID>/status | grep Vm'

# If VmRSS keeps growing = potential leak ```

Check cgroup memory usage:

```bash # For systemd services systemctl show service-name | grep -i memory

# Check cgroup v2 cat /sys/fs/cgroup/system.slice/service-name.service/memory.current cat /sys/fs/cgroup/system.slice/service-name.service/memory.max

# Check cgroup v1 cat /sys/fs/cgroup/memory/system.slice/service-name.service/memory.usage_in_bytes cat /sys/fs/cgroup/memory/system.slice/service-name.service/memory.limit_in_bytes

# List all cgroups by memory usage for dir in /sys/fs/cgroup/memory/system.slice/*/; do usage=$(cat "${dir}memory.usage_in_bytes" 2>/dev/null) limit=$(cat "${dir}memory.limit_in_bytes" 2>/dev/null) echo "$dir: $((usage/1024/1024))MB / $((limit/1024/1024))MB" done | sort -t: -k2 -rn | head -20 ```

### 3. Fix application memory issues

Fix Java memory configuration:

```bash # Java OOM常见原因:堆大小配置不当 # Edit Java application startup

# WRONG: No heap limits java -jar application.jar

# CORRECT: Set heap limits (75% of available RAM for dedicated Java server) java -Xms4g -Xmx4g -jar application.jar

# For containerized Java (respect cgroup limits) # Java 8u191+: java -XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0 -jar application.jar

# Java 10+: java -XX:MaxRAMPercentage=75.0 -jar application.jar

# Enable GC logging for debugging java -Xlog:gc*:file=/var/log/app-gc.log:time,uptime:filecount=5,filesize=10M \ -Xms4g -Xmx4g -jar application.jar

# Check GC activity jstat -gc <PID> 1000 10

# If FGC (full GC)频繁 = memory pressure ```

Fix database memory configuration:

```bash # MySQL/MariaDB # /etc/my.cnf or /etc/mysql/mysql.conf.d/mysqld.cnf

[mysqld] # Set buffer pool to 50-70% of RAM for dedicated DB server innodb_buffer_pool_size = 4G

# For systems with <4GB RAM, use percentage # innodb_buffer_pool_size = 2G

# Other memory settings key_buffer_size = 256M query_cache_size = 0 # Disable in MySQL 8.0+ tmp_table_size = 64M max_heap_table_size = 64M

# PostgreSQL # /etc/postgresql/*/main/postgresql.conf

# Shared buffers (25% of RAM) shared_buffers = 2GB

# Work memory (for sorts, hashes) work_mem = 64MB

# Maintenance work memory maintenance_work_mem = 256MB

# Effective cache size (hint for query planner) effective_cache_size = 6GB ```

Fix application memory leaks:

```bash # Identify memory leak pattern # Check if RSS grows continuously

watch -n 10 'ps -eo pid,comm,rss --sort=-rss | head -10'

# If specific process grows unbounded: # 1. Check application logs for errors # 2. Review application code for leaks # 3. Implement memory limits

# For Node.js applications # Increase heap size export NODE_OPTIONS="--max-old-space-size=4096"

# Or with garbage collection tuning node --max-old-space-size=4096 --max-semi-space-size=128 app.js

# For Python applications # Use memory profiling # pip install memory_profiler # python -m memory_profiler script.py ```

### 4. Configure swap space

Check current swap:

```bash # Check swap status swapon --show

# Or free -h

# Check if swap is being used vmstat 1 5

# Look at si/so columns (swap in/out) # Non-zero = swap activity ```

Add swap file:

```bash # Create swap file (4GB example) sudo fallocate -l 4G /swapfile

# Or with dd (if fallocate not available) sudo dd if=/dev/zero of=/swapfile bs=1G count=4

# Set correct permissions sudo chmod 600 /swapfile

# Format as swap sudo mkswap /swapfile

# Enable swap sudo swapon /swapfile

# Make permanent echo "/swapfile none swap sw 0 0" | sudo tee -a /etc/fstab

# Verify swapon --show free -h ```

Configure swap behavior:

```bash # Check current swappiness cat /proc/sys/vm/swappiness

# Swappiness controls: # 0 = Avoid swap as much as possible # 1-60 = Default (60 on most systems) # 100 = Aggressive swapping

# For servers, lower is usually better # Reduce swappiness sudo sysctl vm.swappiness=10

# Make permanent echo "vm.swappiness = 10" | sudo tee -a /etc/sysctl.conf sudo sysctl -p

# Also check vfs_cache_pressure cat /proc/sys/vm/vfs_cache_pressure # Higher = reclaim caches more aggressively ```

### 5. Fix cgroup memory limits

Check cgroup limits:

```bash # For systemd services systemctl show service-name | grep -E "Memory|OOM"

# Key properties: # MemoryLimit: Hard memory limit # MemoryHigh: Throttle threshold # OOMScoreAdjust: OOM killer priority

# Check if service is being OOM killed due to cgroup limit journalctl -u service-name | grep -i "oom" ```

Configure cgroup memory limits:

```ini # Edit systemd service unit # /etc/systemd/system/service-name.service

[Service] # Memory limit (hard limit, OOM if exceeded) MemoryMax=2G

# Memory high (throttle before hard limit) MemoryHigh=1800M

# OOM score adjustment (-1000 to 1000) # Lower = less likely to be killed OOMScoreAdjust=-500

# Or use memory reservation (soft limit) # MemoryLow=512M ```

Reload and apply:

```bash # Reload systemd configuration sudo systemctl daemon-reload

# Restart service sudo systemctl restart service-name

# Verify limits systemctl show service-name | grep -E "Memory|OOM"

# Check cgroup status cat /sys/fs/cgroup/system.slice/service-name.service/memory.current cat /sys/fs/cgroup/system.slice/service-name.service/memory.max ```

### 6. Adjust OOM killer behavior

Check OOM score:

```bash # View OOM scores for all processes ps -eo pid,comm,oom_score,oom_score_adj --sort=-oom_score | head -20

# Check specific process cat /proc/<PID>/oom_score cat /proc/<PID>/oom_score_adj

# Calculate expected OOM score # Based on: memory usage, process type, runtime, nice value ```

Adjust OOM score:

```bash # Make process less likely to be killed # Range: -1000 (never kill) to 1000 (kill first)

# For critical services (database, init) echo -500 | sudo tee /proc/<PID>/oom_score_adj

# Or via systemd # /etc/systemd/system/service-name.service [Service] OOMScoreAdjust=-800

# For less important services # /etc/systemd/system/service-name.service [Service] OOMScoreAdjust=500

# Important processes to protect: # - SSH daemon (-900) # - Database (-800) # - Init system (-1000)

# Processes that can be sacrificed: # - Worker processes (positive values) # - Batch jobs (positive values) # - Development tools (positive values) ```

Disable OOM killer (not recommended):

```bash # WARNING: Can cause system lockup # Only use for debugging or specific cases

# Disable OOM killer for specific process echo 1 | sudo tee /proc/<PID>/oom_score_adj

# Or set to -1000 (should never be killed) echo -1000 | sudo tee /proc/<PID>/oom_score_adj

# This can cause: # - System freeze if memory exhausted # - Kernel panic in extreme cases # - Other processes killed instead ```

### 7. Monitor memory usage

Create memory monitoring script:

```bash #!/bin/bash # /usr/local/bin/memory-monitor.sh

THRESHOLD=90 ALERT_EMAIL="admin@example.com"

# Get memory usage MEM_INFO=$(free | grep Mem) TOTAL=$(echo $MEM_INFO | awk '{print $2}') USED=$(echo $MEM_INFO | awk '{print $3}') USAGE=$((USED * 100 / TOTAL))

# Get top memory consumers TOP_PROCESSES=$(ps aux --sort=-%mem | head -6)

if [ $USAGE -gt $THRESHOLD ]; then echo "High memory usage: ${USAGE}%" echo "" echo "Top processes:" echo "$TOP_PROCESSES"

# Send alert echo "Memory usage at ${USAGE}%" | \ mail -s "Alert: High Memory Usage" $ALERT_EMAIL

# Log details echo "$(date): Memory at ${USAGE}%" >> /var/log/memory-alerts.log fi

# Check for OOM kills OOM_KILLS=$(dmesg | grep -c "Out of memory") if [ $OOM_KILLS -gt 0 ]; then echo "OOM killer invoked $OOM_KILLS times" fi ```

Configure monitoring with systemd:

```ini # /etc/systemd/system/memory-monitor.service [Unit] Description=Memory Usage Monitor After=network.target

[Service] Type=oneshot ExecStart=/usr/local/bin/memory-monitor.sh

[Install] WantedBy=multi-user.target ```

```ini # /etc/systemd/system/memory-monitor.timer [Unit] Description=Check memory every 5 minutes

[Timer] OnBootSec=5min OnUnitActiveSec=5min Unit=memory-monitor.service

[Install] WantedBy=timers.target ```

### 8. Debug memory issues

Enable memory debugging:

```bash # Check kernel memory allocation cat /proc/slabinfo | head -20

# Look for growing slab caches watch -n 5 'cat /proc/slabinfo | grep -v "^slabinfo"'

# Check kernel memory leaks # Requires debug kernel modprobe kmemleak echo scan > /sys/kernel/debug/kmemleak cat /sys/kernel/debug/kmemleak

# Check for memory fragmentation cat /proc/buddyinfo

# High fragmentation can cause OOM even with free memory ```

Capture memory state on OOM:

```bash # Configure kernel to dump memory state # /etc/sysctl.conf

# Dump full OOM info kernel.panic_on_oom = 0

# Enable OOM notifier vm.panic_on_oom = 0

# Always dump stack on OOM kernel.sysrq = 1

# Trigger OOM killer manually (for testing) # echo f > /proc/sysrq-trigger

# View current sysrq setting cat /proc/sys/kernel/sysrq ```

Analyze memory dumps:

```bash # After OOM event, capture full memory state # Requires crash utility

# Install crash apt install crash # Debian/Ubuntu yum install crash # RHEL/CentOS

# Analyze kernel dump crash /usr/lib/debug/boot/vmlinux /var/crash/vmcore

# Commands: # ps - Show process list # vm - Show virtual memory # kmem - Show kernel memory # sys - Show system info ```

### 9. Fix container memory issues

Docker container memory:

```bash # Check container memory usage docker stats --no-stream

# Check if container was OOM killed docker inspect <container> | grep -A5 "OOMKilled"

# Or docker inspect <container> --format='{{.State.OOMKilled}}'

# Set memory limits docker run --memory="2g" --memory-swap="2g" image

# Or update existing container docker update --memory="2g" --memory-swap="2g" <container>

# Memory options: # --memory: Hard limit # --memory-swap: Total (memory + swap), -1 for unlimited swap # --memory-reservation: Soft limit # --kernel-memory: Kernel memory limit (deprecated) ```

Kubernetes pod memory:

```yaml # Pod specification with memory limits apiVersion: v1 kind: Pod metadata: name: my-app spec: containers: - name: app image: my-app:latest resources: requests: memory: "512Mi" limits: memory: "1Gi"

# If OOMKilled: # 1. Check actual usage kubectl top pod my-app

# 2. Increase limits if needed # 3. Fix application memory leaks

# Check pod status kubectl describe pod my-app | grep -A5 "Last State"

# If OOMKilled: # Last State: Terminated # Reason: OOMKilled # Exit Code: 137 ```

### 10. Implement memory protection

Configure early OOM detection:

```bash # Install earlyoom (kills before kernel OOM) apt install earlyoom # Debian/Ubuntu yum install earlyoom # RHEL/CentOS

# Configure thresholds # /etc/default/earlyoom

EARLYOOM_MEM_FREE="5,2" # Kill at 5% free memory or 2% free swap

EARLYOOM_KILL_PRIO="10" # Kill highest oom_score first

# Enable service systemctl enable earlyoom systemctl start earlyoom

# Check status systemctl status earlyoom journalctl -u earlyoom -f ```

Configure systemd OOM protection:

```ini # Protect critical services # /etc/systemd/system/service-name.service

[Service] # Protect from OOM killer OOMScoreAdjust=-1000

# Or use systemd's OOM protection # (kills service gracefully before OOM) ManagedOOMSwap=kill ManagedOOMMemoryPressure=kill ManagedOOMMemoryPressureLimit=70% ```

Prevention

  • Monitor memory usage with alerting at 80% threshold
  • Set appropriate memory limits for all services
  • Implement early OOM detection (earlyoom)
  • Configure swap space as buffer
  • Regular memory profiling of applications
  • Document memory requirements for each service
  • Use cgroups for process isolation
  • Test applications under memory pressure
  • **Killed**: Process terminated by OOM killer
  • **OOMKilled**: Container killed due to memory limit
  • **Exit code 137**: 128 + 9 (SIGKILL from OOM)
  • **Cannot allocate memory**: malloc/new failed
  • **Out of memory**: Kernel OOM killer invoked