Introduction

When Linux runs out of memory, the kernel invokes the OOM (Out of Memory) killer to free memory by terminating processes. The OOM killer uses a heuristic scoring algorithm to decide which process to kill, and sometimes it selects a critical database or web server instead of a memory-leaking background job. The oom_score_adj value allows administrators to influence this decision.

Symptoms

  • Critical service (PostgreSQL, Nginx) killed while a less important process survives
  • dmesg shows Out of memory: Killed process 1234 (postgres)
  • System becomes unresponsive after losing an essential daemon
  • OOM events recurring despite adequate physical RAM

Common Causes

  • OOM killer heuristic scores critical process higher due to its large memory footprint
  • All processes running with default oom_score_adj of 0
  • Memory-hungry but essential services naturally score higher
  • Container workloads sharing host OOM scores without isolation
  • No memory limits (cgroups) set for individual services

Step-by-Step Fix

  1. 1.Review OOM killer log entries:
  2. 2.```bash
  3. 3.dmesg -T | grep -i "oom|killed process"
  4. 4.journalctl -k | grep -i "out of memory"
  5. 5.`
  6. 6.Check current OOM scores for all processes:
  7. 7.```bash
  8. 8.ps -eo pid,comm,oom_score,oom_score_adj --sort=-oom_score | head -20
  9. 9.`
  10. 10.Higher oom_score means more likely to be killed. oom_score_adj ranges from -1000 (never kill) to +1000 (always kill first).
  11. 11.Protect critical processes by lowering their OOM score:
  12. 12.```bash
  13. 13.# Protect PostgreSQL (set to minimum adjustment)
  14. 14.echo -1000 | sudo tee /proc/$(pgrep -x postgres | head -1)/oom_score_adj

# Protect Nginx master process echo -500 | sudo tee /proc/$(pgrep -x nginx | head -1)/oom_score_adj ```

  1. 1.Increase OOM score for non-critical batch jobs:
  2. 2.```bash
  3. 3.echo 500 | sudo tee /proc/$(pgrep -x backup-script)/oom_score_adj
  4. 4.`
  5. 5.Make OOM adjustments persistent via systemd:
  6. 6.```bash
  7. 7.sudo systemctl edit postgresql.service
  8. 8.`
  9. 9.Add:
  10. 10.```ini
  11. 11.[Service]
  12. 12.OOMScoreAdjust=-500
  13. 13.`
  14. 14.Then reload:
  15. 15.```bash
  16. 16.sudo systemctl daemon-reload
  17. 17.sudo systemctl restart postgresql.service
  18. 18.`
  19. 19.Set memory limits using cgroups to prevent runaway consumption:
  20. 20.```bash
  21. 21.sudo systemctl edit myapp.service
  22. 22.`
  23. 23.Add:
  24. 24.```ini
  25. 25.[Service]
  26. 26.MemoryMax=2G
  27. 27.MemoryHigh=1500M
  28. 28.`

Prevention

  • Always set OOMScoreAdjust in systemd unit files for critical services
  • Use MemoryMax= and MemoryHigh= cgroup limits to contain memory usage
  • Monitor memory trends with tools like vmstat, free -m, or Prometheus node_exporter
  • Configure swap as a buffer to reduce OOM trigger frequency: sudo fallocate -l 4G /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile