Introduction When Elasticsearch heap memory usage grows beyond available physical memory, the Linux kernel's OOM killer terminates the Java process. This causes data nodes to crash unexpectedly, triggering shard reallocation and potential data loss. The issue is particularly insidious because Elasticsearch heap is only one component of memory usage—off-heap caches, Lucene segments, and OS page cache also consume memory.
Symptoms - Elasticsearch process disappears without graceful shutdown logs - `dmesg | grep -i oom` shows `Out of memory: Kill process (java)` - Cluster health turns red after node termination - `jcmd <pid> GC.heap_info` shows heap near `-Xmx` before crash - Node logs show frequent full GCs preceding the OOM kill
Common Causes - Heap set too high (>32GB) disabling compressed ordinary object pointers - Heap set too close to total RAM, leaving no room for off-heap memory - Heavy aggregations or complex queries consuming heap - Too many concurrent search requests overwhelming memory - Lucene segment memory not accounted for in heap sizing
Step-by-Step Fix 1. **Check OOM killer logs": ```bash dmesg -T | grep -i "out of memory\|oom\|killed process" # Example: # [Mon Apr 8 14:32:15 2026] Out of memory: Killed process 1234 (java) total-vm:16777216kB ```
- 1.**Verify and fix JVM heap configuration":
- 2.```bash
- 3.# Check current heap settings
- 4.cat /etc/elasticsearch/jvm.options | grep -E "^-X"
# Fix: Set heap to 50% of RAM, max 31GB (stay under compressed oops limit) # /etc/elasticsearch/jvm.options -Xms16g -Xmx16g ```
- 1.**Configure cgroup memory limits for containerized Elasticsearch":
- 2.```yaml
- 3.# docker-compose.yml
- 4.services:
- 5.elasticsearch:
- 6.image: docker.elastic.co/elasticsearch/elasticsearch:8.13.0
- 7.environment:
- 8.- ES_JAVA_OPTS=-Xms16g -Xmx16g
- 9.deploy:
- 10.resources:
- 11.limits:
- 12.memory: 24g # Heap + off-heap buffer
- 13.
` - 14.**Set up OOM killer priority adjustment":
- 15.```bash
- 16.# Lower the OOM score to make Elasticsearch less likely to be killed
- 17.# But prefer proper sizing over OOM manipulation
- 18.echo -500 > /proc/$(pgrep -f elasticsearch)/oom_score_adj
- 19.
` - 20.**Reduce heap-consuming operations":
- 21.```bash
- 22.# Limit concurrent search requests
- 23.curl -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
- 24."persistent": {
- 25."thread_pool.search.queue_size": 1000,
- 26."thread_pool.search.size": 12
- 27.}
- 28.}'
# Limit aggregation bucket sizes curl -X POST localhost:9200/my_index/_search -H 'Content-Type: application/json' -d '{ "aggs": { "top_categories": { "terms": { "field": "category", "size": 100 } } }, "track_total_hits": false }' ```