Introduction When Elasticsearch heap memory usage grows beyond available physical memory, the Linux kernel's OOM killer terminates the Java process. This causes data nodes to crash unexpectedly, triggering shard reallocation and potential data loss. The issue is particularly insidious because Elasticsearch heap is only one component of memory usage—off-heap caches, Lucene segments, and OS page cache also consume memory.

Symptoms - Elasticsearch process disappears without graceful shutdown logs - `dmesg | grep -i oom` shows `Out of memory: Kill process (java)` - Cluster health turns red after node termination - `jcmd <pid> GC.heap_info` shows heap near `-Xmx` before crash - Node logs show frequent full GCs preceding the OOM kill

Common Causes - Heap set too high (>32GB) disabling compressed ordinary object pointers - Heap set too close to total RAM, leaving no room for off-heap memory - Heavy aggregations or complex queries consuming heap - Too many concurrent search requests overwhelming memory - Lucene segment memory not accounted for in heap sizing

Step-by-Step Fix 1. **Check OOM killer logs": ```bash dmesg -T | grep -i "out of memory\|oom\|killed process" # Example: # [Mon Apr 8 14:32:15 2026] Out of memory: Killed process 1234 (java) total-vm:16777216kB ```

  1. 1.**Verify and fix JVM heap configuration":
  2. 2.```bash
  3. 3.# Check current heap settings
  4. 4.cat /etc/elasticsearch/jvm.options | grep -E "^-X"

# Fix: Set heap to 50% of RAM, max 31GB (stay under compressed oops limit) # /etc/elasticsearch/jvm.options -Xms16g -Xmx16g ```

  1. 1.**Configure cgroup memory limits for containerized Elasticsearch":
  2. 2.```yaml
  3. 3.# docker-compose.yml
  4. 4.services:
  5. 5.elasticsearch:
  6. 6.image: docker.elastic.co/elasticsearch/elasticsearch:8.13.0
  7. 7.environment:
  8. 8.- ES_JAVA_OPTS=-Xms16g -Xmx16g
  9. 9.deploy:
  10. 10.resources:
  11. 11.limits:
  12. 12.memory: 24g # Heap + off-heap buffer
  13. 13.`
  14. 14.**Set up OOM killer priority adjustment":
  15. 15.```bash
  16. 16.# Lower the OOM score to make Elasticsearch less likely to be killed
  17. 17.# But prefer proper sizing over OOM manipulation
  18. 18.echo -500 > /proc/$(pgrep -f elasticsearch)/oom_score_adj
  19. 19.`
  20. 20.**Reduce heap-consuming operations":
  21. 21.```bash
  22. 22.# Limit concurrent search requests
  23. 23.curl -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
  24. 24."persistent": {
  25. 25."thread_pool.search.queue_size": 1000,
  26. 26."thread_pool.search.size": 12
  27. 27.}
  28. 28.}'

# Limit aggregation bucket sizes curl -X POST localhost:9200/my_index/_search -H 'Content-Type: application/json' -d '{ "aggs": { "top_categories": { "terms": { "field": "category", "size": 100 } } }, "track_total_hits": false }' ```

Prevention - Never set heap above 31GB to maintain compressed ordinary object pointers - Leave at least 50% of RAM for off-heap memory (Lucene, OS cache) - Use `_nodes/stats` to monitor `jvm.mem.heap_used_percent` and alert at 75% - Limit aggregation sizes and use `composite` aggregation for large result sets - Use dedicated coordinating nodes for heavy search workloads - Enable circuit breakers as a safety net before OOM conditions - Use cgroup v2 memory limits in containerized deployments