Elasticsearch Heap OOM Killer on Data Nodes - Diagnosis and Fix

Introduction When Elasticsearch heap memory usage grows beyond available physical memory, the Linux kernel's OOM killer terminates the Java process. This causes data nodes to crash unexpectedly, triggering shard reallocation and potential data loss. The issue is particularly insidious because Elasticsearch heap is only one component of memory usage—off-heap caches, Lucene segments, and OS page cache also consume memory.

Symptoms - Elasticsearch process disappears without graceful shutdown logs - `dmesg | grep -i oom` shows `Out of memory: Kill process (java)` - Cluster health turns red after node termination - `jcmd <pid> GC.heap_info` shows heap near `-Xmx` before crash - Node logs show frequent full GCs preceding the OOM kill

Common Causes - Heap set too high (>32GB) disabling compressed ordinary object pointers - Heap set too close to total RAM, leaving no room for off-heap memory - Heavy aggregations or complex queries consuming heap - Too many concurrent search requests overwhelming memory - Lucene segment memory not accounted for in heap sizing

Step-by-Step Fix 1. **Check OOM killer logs": ```bash dmesg -T | grep -i "out of memory\|oom\|killed process" # Example: # [Mon Apr 8 14:32:15 2026] Out of memory: Killed process 1234 (java) total-vm:16777216kB ```

1.**Verify and fix JVM heap configuration":
2.```bash
3.# Check current heap settings
4.cat /etc/elasticsearch/jvm.options | grep -E "^-X"

# Fix: Set heap to 50% of RAM, max 31GB (stay under compressed oops limit) # /etc/elasticsearch/jvm.options -Xms16g -Xmx16g ```

1.**Configure cgroup memory limits for containerized Elasticsearch":
2.```yaml
3.# docker-compose.yml
4.services:
5.elasticsearch:
6.image: docker.elastic.co/elasticsearch/elasticsearch:8.13.0
7.environment:
8.- ES_JAVA_OPTS=-Xms16g -Xmx16g
9.deploy:
10.resources:
11.limits:
12.memory: 24g # Heap + off-heap buffer
13.`
14.**Set up OOM killer priority adjustment":
15.```bash
16.# Lower the OOM score to make Elasticsearch less likely to be killed
17.# But prefer proper sizing over OOM manipulation
18.echo -500 > /proc/$(pgrep -f elasticsearch)/oom_score_adj
19.`
20.**Reduce heap-consuming operations":
21.```bash
22.# Limit concurrent search requests
23.curl -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
24."persistent": {
25."thread_pool.search.queue_size": 1000,
26."thread_pool.search.size": 12
27.}
28.}'

# Limit aggregation bucket sizes curl -X POST localhost:9200/my_index/_search -H 'Content-Type: application/json' -d '{ "aggs": { "top_categories": { "terms": { "field": "category", "size": 100 } } }, "track_total_hits": false }' ```

Prevention - Never set heap above 31GB to maintain compressed ordinary object pointers - Leave at least 50% of RAM for off-heap memory (Lucene, OS cache) - Use `_nodes/stats` to monitor `jvm.mem.heap_used_percent` and alert at 75% - Limit aggregation sizes and use `composite` aggregation for large result sets - Use dedicated coordinating nodes for heavy search workloads - Enable circuit breakers as a safety net before OOM conditions - Use cgroup v2 memory limits in containerized deployments

Elasticsearch Heap OOM Killer Triggered on Data Nodes

Step-by-Step Fix 1. **Check OOM killer logs": ```bash dmesg -T | grep -i "out of memory\|oom\|killed process" # Example: # [Mon Apr 8 14:32:15 2026] Out of memory: Killed process 1234 (java) total-vm:16777216kB ```

Share this guide

More Elasticsearch Troubleshooting Guides

Elasticsearch Vector Similarity Slow

Elasticsearch kNN Search Vector Dimension Mismatch

Elasticsearch SQL Query Syntax Error

Fix Elasticsearch Cluster Red

Elasticsearch Geo Shape Query Error

Elasticsearch Percolate Query Failed