# How to Fix Elasticsearch Heap Size Issues

Your Elasticsearch nodes are either crashing with OutOfMemoryError or performing poorly due to heap misconfiguration. Getting the JVM heap size right is crucial for stability and performance.

Recognizing Heap Size Problems

Signs of Undersized Heap

You'll see these symptoms when heap is too small:

OutOfMemoryError in logs:

bash
java.lang.OutOfMemoryError: Java heap space
[ERROR][o.e.b.HierarchyCircuitBreakerService] circuit breaker triggered

Circuit breaker trips frequently:

bash
curl -X GET "localhost:9200/_nodes/stats/breaker?pretty"
json
{
  "breakers" : {
    "parent" : {
      "tripped" : 150
    }
  }
}

High heap usage constantly:

bash
curl -X GET "localhost:9200/_nodes/stats/jvm?pretty"
json
{
  "jvm" : {
    "mem" : {
      "heap_used_percent" : 95
    }
  }
}

Signs of Oversized Heap

Counterintuitively, too much heap also causes problems:

Long garbage collection pauses:

bash
[WARN ][o.e.m.j.JvmMonitorService] [node-1] detected slow gc [G1 Young Generation] duration [1.5s]
[WARN ][o.e.m.j.JvmMonitorService] [node-1] detected slow gc [G1 Old Generation] duration [15s]

Node becomes unresponsive during GC:

bash
curl -X GET "localhost:9200/_nodes/stats/jvm?pretty" | grep gc
json
{
  "gc" : {
    "collectors" : {
      "old" : {
        "collection_time_in_millis" : 450000
      }
    }
  }
}

Queries timeout intermittently.

Understanding JVM Heap

The JVM heap stores:

  • Lucene index segments (in-memory structures)
  • Field data cache for aggregations
  • Query cache for repeated searches
  • Request memory for active operations
  • Cluster state metadata

Lucene also uses off-heap memory for index segments stored on disk. This is important: the heap is not the only memory Elasticsearch uses.

Checking Current Heap Configuration

bash
curl -X GET "localhost:9200/_nodes/stats/jvm?pretty"
json
{
  "nodes" : {
    "node-1" : {
      "jvm" : {
        "mem" : {
          "heap_used_in_bytes" : 8000000000,
          "heap_max_in_bytes" : 16000000000,
          "heap_used_percent" : 50
        }
      }
    }
  }
}

Check the JVM options file:

bash
cat /etc/elasticsearch/jvm.options | grep "Xm"
bash
-Xms8g
-Xmx8g

Or in newer versions:

bash
cat /etc/elasticsearch/jvm.options.d/heap.options

The Heap Size Rule

Elasticsearch recommends:

  1. 1.Set minimum and maximum to the same value (-Xms = -Xmx)
  2. 2.Don't exceed 50% of physical RAM
  3. 3.Don't exceed 31GB (compressed OOPs threshold)

The 50% rule leaves memory for:

  • Lucene off-heap segment memory
  • Operating system file cache
  • Network buffers
  • Other processes

The 31GB limit preserves compressed ordinary object pointers (OOPs). Below 31GB, JVM uses 32-bit pointers, saving memory. Above 31GB, it switches to 64-bit pointers, actually reducing effective heap.

Solution 1: Increase Heap Size

For undersized heap, increase it:

```bash # Edit /etc/elasticsearch/jvm.options # Change: -Xms4g -Xmx4g

# To: -Xms8g -Xmx8g ```

For Elasticsearch 8.x+, use environment variable or options file:

bash
# Create a custom options file
echo '-Xms8g' > /etc/elasticsearch/jvm.options.d/heap.options
echo '-Xmx8g' >> /etc/elasticsearch/jvm.options.d/heap.options

Restart Elasticsearch:

bash
systemctl restart elasticsearch

Verify the change:

bash
curl -X GET "localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.mem.heap_max_in_bytes&pretty"

Solution 2: Decrease Heap Size

For oversized heap (causing GC pauses), reduce it:

bash
# From 32GB down to 24GB
-Xms24g
-Xmx24g

This keeps you under the 31GB compressed OOPs threshold and leaves more RAM for the OS file cache.

Monitor GC after the change:

bash
curl -X GET "localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.gc&pretty"

Solution 3: Change Garbage Collector

For large heaps, G1GC is the default and usually best. But for specific cases, you might adjust:

bash
# In jvm.options
-XX:+UseG1GC
-XX:G1HeapRegionSize=32m
-XX:InitiatingHeapOccupancyPercent=30

The InitiatingHeapOccupancyPercent controls when G1 starts concurrent marking. Lower values trigger earlier, reducing old GC pauses.

Do NOT use the old CMS collector for modern Elasticsearch:

bash
# Don't use this
# -XX:+UseConcMarkSweepGC

Solution 4: Configure Heap Dump on OOM

To diagnose OOM errors, enable heap dumps:

bash
# Add to jvm.options
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/log/elasticsearch/heapdump.hprof

Analyze the dump with tools like Eclipse Memory Analyzer or VisualVM.

Solution 5: Reduce Memory Usage

If you can't increase heap, reduce memory consumption:

Clear caches:

bash
curl -X POST "localhost:9200/_cache/clear"

Reduce field data cache:

bash
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "indices.fielddata.cache.size": "20%"
  }
}
'

Reduce query cache:

bash
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "indices.queries.cache.size": "5%"
  }
}
'

Use doc_values instead of fielddata:

Doc values are stored on disk, not in heap:

bash
curl -X PUT "localhost:9200/your-index/_mapping" -H 'Content-Type: application/json' -d'
{
  "properties": {
    "category": {
      "type": "keyword",
      "doc_values": true
    }
  }
}
'

Solution 6: Scale Horizontally

If you're hitting heap limits constantly, add nodes:

bash
curl -X GET "localhost:9200/_cat/nodes?v&h=name,heap.percent"
bash
name   heap.percent
node-1 85
node-2 90
node-3 82

New nodes will receive shards, distributing the load.

Heap Size Calculator

Use this formula:

``` Recommended Heap = min(50% of RAM, 31GB)

Examples: - 16GB RAM server: Heap = 8GB - 32GB RAM server: Heap = 16GB (not 32GB - leave for OS) - 64GB RAM server: Heap = 31GB (compressed OOPs limit) - 128GB RAM server: Heap = 31GB (still 31GB max) ```

For memory-heavy workloads (many aggregations, large fielddata):

bash
Heap = 40-50% of RAM (up to 31GB)

For search-heavy workloads (Lucene dominant):

bash
Heap = 20-30% of RAM (more for OS file cache)

Verification Steps

After changing heap size:

  1. 1.Check JVM settings applied:
bash
curl -X GET "localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.mem&pretty"
  1. 1.Monitor heap usage over time:
bash
watch -n 10 'curl -s localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.mem.heap_used_percent | jq'
  1. 1.Check GC behavior:
bash
curl -X GET "localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.gc&pretty"
  1. 1.Run a workload test:
bash
curl -X GET "localhost:9200/_bench?pretty" # Or your actual workload

Common Heap Size Mistakes

MistakeProblemFix
Setting max > minHeap resize overhead during runtimeSet -Xms = -Xmx
Exceeding 31GBLost compressed OOPs, slower GCCap at 30GB
Using 100% of RAMNo room for OS/LuceneUse 50% max
Different sizes per nodeInconsistent behaviorStandardize config
Ignoring physical RAMServer-specific issuesConfigure per server

Heap Monitoring Dashboard

Create a monitoring script:

bash
#!/bin/bash
echo "=== Elasticsearch Heap Monitor ==="
while true; do
  echo "$(date)"
  curl -s "localhost:9200/_nodes/stats/jvm" | jq '
    .nodes | to_entries[] | {
      node: .value.name,
      heap_percent: .value.jvm.mem.heap_used_percent,
      heap_max_gb: (.value.jvm.mem.heap_max_in_bytes / 1073741824 | floor),
      gc_old_time_s: (.value.jvm.gc.collectors.old.collection_time_in_millis / 1000 | floor)
    }
  '
  echo ""
  sleep 30
done

Summary

Configure Elasticsearch heap correctly by:

  1. 1.Setting -Xms and -Xmx to the same value
  2. 2.Using 50% of physical RAM (maximum)
  3. 3.Not exceeding 31GB to preserve compressed OOPs
  4. 4.Monitoring heap usage and GC behavior
  5. 5.Adjusting based on workload type
  6. 6.Scaling horizontally when heap limits reached

Proper heap configuration prevents both OOM crashes and GC-induced latency. Monitor continuously and adjust as your workload evolves.