Your server suddenly becomes unresponsive. You check the logs and see messages like "Out of memory: Kill process 1234 (java) score 567 or sacrifice child." The OOM killer has struck again, terminating processes to keep the system running.
Understanding the Problem
The Linux kernel's OOM (Out-Of-Memory) killer activates when the system exhausts available memory. When no more memory can be allocated and no process will voluntarily release memory, the kernel selects a process to terminate based on its OOM score.
Typical Error Messages
Out of memory: Kill process 1842 (mysqld) score 527 or sacrifice child
Killed process 1842 (mysqld) total-vm:8388608kB, anon-rss:4194304kB, file-rss:0kB
Memory cgroup out of memory: Kill process 1842 (mysqld) score 527 or sacrifice childYou might also notice: - Services crashing unexpectedly - System becoming extremely slow before recovering - SSH connections dropping - Applications reporting "Cannot allocate memory" errors
Diagnosing the Issue
First, verify that OOM killer was indeed the culprit:
```bash # Check for OOM killer activity in kernel ring buffer dmesg | grep -i "out of memory" dmesg | grep -i "killed process"
# Search system logs for OOM events journalctl -k --since "1 hour ago" | grep -i oom grep -i "out of memory" /var/log/syslog grep -i "oom-killer" /var/log/messages ```
Examine current memory usage to understand the scope:
```bash # Quick overview of memory free -h
# Detailed memory breakdown cat /proc/meminfo | head -20
# Top memory-consuming processes ps aux --sort=-%mem | head -15
# Alternative: use smem if installed smem -t -k -s rss | tail -20 ```
For a more detailed analysis, check the OOM scores of running processes:
# List processes with their OOM scores (higher = more likely to be killed)
find /proc -maxdepth 2 -name oom_score -exec sh -c 'pid=$(dirname {} | xargs basename); score=$(cat {}); comm=$(cat $(dirname {})/comm 2>/dev/null); echo "$pid $score $comm"' \; | sort -k2 -rn | head -20Solutions
Immediate Relief: Free Up Memory
If the system is still responsive but under memory pressure:
```bash # Clear page cache (safe, only frees cached data) sync && echo 1 > /proc/sys/vm/drop_caches
# Clear dentries and inodes echo 2 > /proc/sys/vm/drop_caches
# Clear all caches (use cautiously) echo 3 > /proc/sys/vm/drop_caches
# For PostgreSQL, trigger a checkpoint to flush dirty buffers sudo -u postgres psql -c "CHECKPOINT;"
# Restart memory-hungry services during low traffic systemctl restart application-name ```
Protect Critical Processes
Prevent essential services from being killed by adjusting their OOM score:
```bash # Lower OOM score (-1000 to 1000, lower = less likely to be killed) # -1000 completely disables OOM killing for the process echo -1000 > /proc/$(pidof sshd)/oom_score_adj
# For systemd services, add to the service file: # [Service] # OOMScoreAdjust=-500
# Verify the adjustment cat /proc/$(pidof sshd)/oom_score_adj ```
For a systemd-managed service like PostgreSQL:
```bash # Create an override systemctl edit postgresql
# Add: [Service] OOMScoreAdjust=-500
# Reload and restart systemctl daemon-reload systemctl restart postgresql ```
Tune Virtual Memory Parameters
Adjust kernel parameters to handle memory pressure better:
```bash # View current settings sysctl vm.swappiness vm.vfs_cache_pressure vm.overcommit_memory
# Reduce swappiness (default 60, lower = less swap usage) sysctl -w vm.swappiness=10
# Increase cache pressure to reclaim inode/dentry cache more aggressively sysctl -w vm.vfs_cache_pressure=200
# Control overcommit behavior # 0: heuristic overcommit (default) # 1: always overcommit # 2: never overcommit, strict accounting sysctl -w vm.overcommit_memory=2 sysctl -w vm.overcommit_ratio=80 ```
Make these changes persistent by adding to /etc/sysctl.conf or creating a file in /etc/sysctl.d/:
```bash # Create persistent configuration cat > /etc/sysctl.d/99-memory-tuning.conf << 'EOF' vm.swappiness = 10 vm.vfs_cache_pressure = 200 vm.overcommit_memory = 2 vm.overcommit_ratio = 80 EOF
# Apply changes sysctl -p /etc/sysctl.d/99-memory-tuning.conf ```
Add Swap Space
If you lack sufficient swap, create additional swap file:
```bash # Check current swap swapon --show
# Create a 4GB swap file sudo fallocate -l 4G /swapfile sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile
# Verify swapon --show free -h
# Make permanent echo '/swapfile none swap sw 0 0' >> /etc/fstab ```
Monitor and Alert
Set up monitoring to catch memory issues early:
```bash # Quick monitoring script cat > /usr/local/bin/memory-check.sh << 'EOF' #!/bin/bash THRESHOLD=90 MEM_USAGE=$(free | awk '/Mem:/ {printf "%.0f", $3/$2 * 100}') if [ "$MEM_USAGE" -gt "$THRESHOLD" ]; then echo "WARNING: Memory usage at ${MEM_USAGE}%" ps aux --sort=-%mem | head -10 | mail -s "Memory Alert" admin@example.com fi EOF chmod +x /usr/local/bin/memory-check.sh
# Add to crontab for periodic checks echo '*/5 * * * * /usr/local/bin/memory-check.sh' | crontab - ```
Verification
After implementing fixes, verify the system is stable:
```bash # Monitor memory in real-time watch -n 1 free -h
# Check for recent OOM events dmesg -T | grep -i "out of memory" | tail -5
# Verify OOM score adjustments cat /proc/$(pidof sshd)/oom_score_adj
# Confirm swap is active swapon --show
# Check sysctl settings are applied sysctl vm.swappiness vm.overcommit_memory ```
Prevention Tips
- Set appropriate
ulimitvalues for applications to prevent runaway memory usage - Use containerization (Docker, Podman) with memory limits to isolate memory-hungry applications
- Monitor memory trends over time with tools like Prometheus, Grafana, or Zabbix
- Review application configurations for memory-related settings (JVM heap size, PHP memory_limit, etc.)
- Consider upgrading physical RAM if consistently near capacity during normal operations