Fix Elasticsearch Heap Size Issues - JVM Memory Optimization Guide

# How to Fix Elasticsearch Heap Size Issues

Your Elasticsearch nodes are either crashing with OutOfMemoryError or performing poorly due to heap misconfiguration. Getting the JVM heap size right is crucial for stability and performance.

Recognizing Heap Size Problems

Signs of Undersized Heap

You'll see these symptoms when heap is too small:

OutOfMemoryError in logs:

bash

java.lang.OutOfMemoryError: Java heap space
[ERROR][o.e.b.HierarchyCircuitBreakerService] circuit breaker triggered

Circuit breaker trips frequently:

bash

curl -X GET "localhost:9200/_nodes/stats/breaker?pretty"

json

{
  "breakers" : {
    "parent" : {
      "tripped" : 150
    }
  }
}

High heap usage constantly:

bash

curl -X GET "localhost:9200/_nodes/stats/jvm?pretty"

json

{
  "jvm" : {
    "mem" : {
      "heap_used_percent" : 95
    }
  }
}

Signs of Oversized Heap

Counterintuitively, too much heap also causes problems:

Long garbage collection pauses:

bash

[WARN ][o.e.m.j.JvmMonitorService] [node-1] detected slow gc [G1 Young Generation] duration [1.5s]
[WARN ][o.e.m.j.JvmMonitorService] [node-1] detected slow gc [G1 Old Generation] duration [15s]

Node becomes unresponsive during GC:

bash

curl -X GET "localhost:9200/_nodes/stats/jvm?pretty" | grep gc

json

{
  "gc" : {
    "collectors" : {
      "old" : {
        "collection_time_in_millis" : 450000
      }
    }
  }
}

Queries timeout intermittently.

Understanding JVM Heap

The JVM heap stores:

Lucene index segments (in-memory structures)
Field data cache for aggregations
Query cache for repeated searches
Request memory for active operations
Cluster state metadata

Lucene also uses off-heap memory for index segments stored on disk. This is important: the heap is not the only memory Elasticsearch uses.

Checking Current Heap Configuration

bash

curl -X GET "localhost:9200/_nodes/stats/jvm?pretty"

json

{
  "nodes" : {
    "node-1" : {
      "jvm" : {
        "mem" : {
          "heap_used_in_bytes" : 8000000000,
          "heap_max_in_bytes" : 16000000000,
          "heap_used_percent" : 50
        }
      }
    }
  }
}

Check the JVM options file:

bash

cat /etc/elasticsearch/jvm.options | grep "Xm"

bash

-Xms8g
-Xmx8g

Or in newer versions:

bash

cat /etc/elasticsearch/jvm.options.d/heap.options

The Heap Size Rule

Elasticsearch recommends:

1.Set minimum and maximum to the same value (-Xms = -Xmx)
2.Don't exceed 50% of physical RAM
3.Don't exceed 31GB (compressed OOPs threshold)

The 50% rule leaves memory for:

Lucene off-heap segment memory
Operating system file cache
Network buffers
Other processes

The 31GB limit preserves compressed ordinary object pointers (OOPs). Below 31GB, JVM uses 32-bit pointers, saving memory. Above 31GB, it switches to 64-bit pointers, actually reducing effective heap.

Solution 1: Increase Heap Size

For undersized heap, increase it:

```bash # Edit /etc/elasticsearch/jvm.options # Change: -Xms4g -Xmx4g

# To: -Xms8g -Xmx8g ```

For Elasticsearch 8.x+, use environment variable or options file:

bash

# Create a custom options file
echo '-Xms8g' > /etc/elasticsearch/jvm.options.d/heap.options
echo '-Xmx8g' >> /etc/elasticsearch/jvm.options.d/heap.options

Restart Elasticsearch:

bash

systemctl restart elasticsearch

Verify the change:

bash

curl -X GET "localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.mem.heap_max_in_bytes&pretty"

Solution 2: Decrease Heap Size

For oversized heap (causing GC pauses), reduce it:

bash

# From 32GB down to 24GB
-Xms24g
-Xmx24g

This keeps you under the 31GB compressed OOPs threshold and leaves more RAM for the OS file cache.

Monitor GC after the change:

bash

curl -X GET "localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.gc&pretty"

Solution 3: Change Garbage Collector

For large heaps, G1GC is the default and usually best. But for specific cases, you might adjust:

bash

# In jvm.options
-XX:+UseG1GC
-XX:G1HeapRegionSize=32m
-XX:InitiatingHeapOccupancyPercent=30

The InitiatingHeapOccupancyPercent controls when G1 starts concurrent marking. Lower values trigger earlier, reducing old GC pauses.

Do NOT use the old CMS collector for modern Elasticsearch:

bash

# Don't use this
# -XX:+UseConcMarkSweepGC

Solution 4: Configure Heap Dump on OOM

To diagnose OOM errors, enable heap dumps:

bash

# Add to jvm.options
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/log/elasticsearch/heapdump.hprof

Analyze the dump with tools like Eclipse Memory Analyzer or VisualVM.

Solution 5: Reduce Memory Usage

If you can't increase heap, reduce memory consumption:

Clear caches:

bash

curl -X POST "localhost:9200/_cache/clear"

Reduce field data cache:

bash

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "indices.fielddata.cache.size": "20%"
  }
}
'

Reduce query cache:

bash

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "indices.queries.cache.size": "5%"
  }
}
'

Use doc_values instead of fielddata:

Doc values are stored on disk, not in heap:

bash

curl -X PUT "localhost:9200/your-index/_mapping" -H 'Content-Type: application/json' -d'
{
  "properties": {
    "category": {
      "type": "keyword",
      "doc_values": true
    }
  }
}
'

Solution 6: Scale Horizontally

If you're hitting heap limits constantly, add nodes:

bash

curl -X GET "localhost:9200/_cat/nodes?v&h=name,heap.percent"

bash

name   heap.percent
node-1 85
node-2 90
node-3 82

New nodes will receive shards, distributing the load.

Heap Size Calculator

Use this formula:

``` Recommended Heap = min(50% of RAM, 31GB)

Examples: - 16GB RAM server: Heap = 8GB - 32GB RAM server: Heap = 16GB (not 32GB - leave for OS) - 64GB RAM server: Heap = 31GB (compressed OOPs limit) - 128GB RAM server: Heap = 31GB (still 31GB max) ```

For memory-heavy workloads (many aggregations, large fielddata):

bash

Heap = 40-50% of RAM (up to 31GB)

For search-heavy workloads (Lucene dominant):

bash

Heap = 20-30% of RAM (more for OS file cache)

Verification Steps

After changing heap size:

1.Check JVM settings applied:

bash

curl -X GET "localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.mem&pretty"

1.Monitor heap usage over time:

bash

watch -n 10 'curl -s localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.mem.heap_used_percent | jq'

1.Check GC behavior:

bash

curl -X GET "localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.gc&pretty"

1.Run a workload test:

bash

curl -X GET "localhost:9200/_bench?pretty" # Or your actual workload

Common Heap Size Mistakes

Mistake	Problem	Fix
Setting max > min	Heap resize overhead during runtime	Set -Xms = -Xmx
Exceeding 31GB	Lost compressed OOPs, slower GC	Cap at 30GB
Using 100% of RAM	No room for OS/Lucene	Use 50% max
Different sizes per node	Inconsistent behavior	Standardize config
Ignoring physical RAM	Server-specific issues	Configure per server

Heap Monitoring Dashboard

Create a monitoring script:

bash

#!/bin/bash
echo "=== Elasticsearch Heap Monitor ==="
while true; do
  echo "$(date)"
  curl -s "localhost:9200/_nodes/stats/jvm" | jq '
    .nodes | to_entries[] | {
      node: .value.name,
      heap_percent: .value.jvm.mem.heap_used_percent,
      heap_max_gb: (.value.jvm.mem.heap_max_in_bytes / 1073741824 | floor),
      gc_old_time_s: (.value.jvm.gc.collectors.old.collection_time_in_millis / 1000 | floor)
    }
  '
  echo ""
  sleep 30
done

Summary

Configure Elasticsearch heap correctly by:

1.Setting -Xms and -Xmx to the same value
2.Using 50% of physical RAM (maximum)
3.Not exceeding 31GB to preserve compressed OOPs
4.Monitoring heap usage and GC behavior
5.Adjusting based on workload type
6.Scaling horizontally when heap limits reached

Proper heap configuration prevents both OOM crashes and GC-induced latency. Monitor continuously and adjust as your workload evolves.

How to Fix Elasticsearch Heap Size Issues

Recognizing Heap Size Problems

Signs of Undersized Heap

Signs of Oversized Heap

Understanding JVM Heap

Checking Current Heap Configuration

The Heap Size Rule

Solution 1: Increase Heap Size

Solution 2: Decrease Heap Size

Solution 3: Change Garbage Collector

Solution 4: Configure Heap Dump on OOM

Solution 5: Reduce Memory Usage

Solution 6: Scale Horizontally

Heap Size Calculator

Verification Steps

Common Heap Size Mistakes

Heap Monitoring Dashboard

Summary

Share this guide

More Monitoring Troubleshooting Guides

Metric Retention Expired

Timeseries Storage Full

Collector Agent Crashed

Webhook Notification Timeout

SMS Notification Failed

Email Notification Bounced