Introduction
Elasticsearch cluster health issues manifest as RED or YELLOW cluster status, indicating data loss risk or degraded search performance. A RED cluster means primary shards are unassigned (data loss), while YELLOW indicates replica shards are unassigned (reduced fault tolerance). Common causes include node failures, disk space exhaustion, heap memory pressure, network partitions causing split-brain, shard allocation filtering, version incompatibility during upgrades, and corrupted Lucene indices. The fix requires understanding Elasticsearch architecture (nodes, shards, replicas, indices), cluster state management, shard allocation explanation API, heap memory tuning, and recovery procedures. This guide provides production-proven troubleshooting for Elasticsearch cluster health issues across single-node development clusters to multi-node production deployments.
Symptoms
- Cluster health returns
REDorYELLOWinstead ofGREEN _cat/shardsshows UNASSIGNED shards- Search queries return incomplete results or timeout errors
- Indexing operations fail with
cluster_block_exception - Nodes frequently disconnect and rejoin cluster
- Master node elections occurring repeatedly (master flips)
- High heap memory usage (>90%) triggering GC pressure
- Disk usage exceeds flood stage watermark (95%)
- Split-brain scenario with multiple masters
- Slow search performance with high latency
Common Causes
- Node crash or network failure leaving shards unassigned
- Disk space below low/high/flood stage watermarks
- Heap memory exhausted causing OOM or excessive GC
cluster.routing.allocation.enableset to none- Shard allocation filtering excluding target nodes
- Version mismatch preventing shard assignment
- Corrupted Lucene index segments
- Insufficient replicas for number of available nodes
- Master node ineligible (master: false in node.roles)
- Network partition causing split-brain
- Index closed or read-only due to disk pressure
- Too many open indices exceeding cluster limits
Step-by-Step Fix
### 1. Diagnose cluster health status
Check overall cluster state:
```bash # Check cluster health curl -X GET "localhost:9200/_cluster/health?pretty"
# Output: # { # "cluster_name" : "production-cluster", # "status" : "yellow", # "timed_out" : false, # "number_of_nodes" : 3, # "number_of_data_nodes" : 2, # "active_primary_shards" : 50, # "active_shards" : 100, # "relocating_shards" : 0, # "initializing_shards" : 0, # "unassigned_shards" : 50, # "delayed_unassigned_shards" : 0, # "number_of_pending_tasks" : 0, # "number_of_in_flight_fetch" : 0, # "task_max_waiting_in_queue_millis" : 0, # "active_shards_percent_as_number" : 66.67 # }
# Status meanings: # GREEN - All primary and replica shards allocated # YELLOW - All primaries allocated, some replicas unassigned # RED - Some primary shards unassigned (data loss)
# Check cluster state details curl -X GET "localhost:9200/_cluster/state?pretty"
# Check pending tasks curl -X GET "localhost:9200/_cluster/pending_tasks?pretty"
# Check allocation explanation curl -X GET "localhost:9200/_cluster/allocation/explain?pretty" -H 'Content-Type: application/json' -d' { "index": "my-index", "shard": 0, "primary": true }'
# Returns WHY a shard is unassigned and WHERE it can be allocated ```
Identify unassigned shards:
```bash # List all shards with status curl -X GET "localhost:9200/_cat/shards?v"
# Output: # index shard prirep state docs store ip node # my-index 0 p STARTED 10000 50mb 10.0.0.1 node-1 # my-index 0 r UNASSIGNED # my-index 1 p STARTED 8000 40mb 10.0.0.2 node-2 # my-index 1 r UNASSIGNED
# Filter unassigned shards curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,docs,store,node" | grep UNASSIGNED
# Check node status curl -X GET "localhost:9200/_cat/nodes?v&h=ip,name,node.role,master,ram.percent,heap.percent,load_1m"
# Output: # ip name node.role master ram.percent heap.percent load_1m # 10.0.0.1 node-1 dim * 85 78 2.5 # 10.0.0.2 node-2 d - 92 95 4.2 # 10.0.0.3 node-3 m - 45 30 0.5
# Node roles: # m - master eligible # d - data node # i - ingest node # v - voting-only node ```
### 2. Fix disk space issues
Disk watermarks control shard allocation:
```bash # Check disk usage curl -X GET "localhost:9200/_cat/allocation?v"
# Output: # shards disk.indices disk.used disk.avail disk.total disk.percent host ip node # 65 500gb 800gb 200gb 1000gb 80 10.0.0.1 10.0.0.1 node-1 # 50 400gb 750gb 250gb 1000gb 75 10.0.0.2 10.0.0.2 node-2
# Check watermark settings curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&pretty" \ | grep -A5 watermark
# Default watermarks: # cluster.routing.allocation.disk.watermark.low: 85% # cluster.routing.allocation.disk.watermark.high: 90% # cluster.routing.allocation.disk.watermark.flood_stage: 95%
# If disk exceeds flood_stage, indices become read-only # Check index blocks curl -X GET "localhost:9200/_settings?pretty" | grep -A3 "read_only" ```
Fix disk watermark issues:
```bash # Option 1: Free disk space # Delete old indices curl -X DELETE "localhost:9200/logs-2025.*"
# Delete old snapshots curl -X DELETE "localhost:9200/_snapshot/my-snapshot/snapshot-2025"
# Clear field data cache curl -X POST "localhost:9200/_cache/clear?fielddata=true"
# Option 2: Temporarily adjust watermarks (not recommended long-term) curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d' { "transient": { "cluster.routing.allocation.disk.watermark.low": "90%", "cluster.routing.allocation.disk.watermark.high": "95%", "cluster.routing.allocation.disk.watermark.flood_stage": "97%" } }'
# Option 3: Use absolute byte values instead of percentages curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d' { "transient": { "cluster.routing.allocation.disk.watermark.low": "100gb", "cluster.routing.allocation.disk.watermark.high": "50gb", "cluster.routing.allocation.disk.watermark.flood_stage": "20gb" } }'
# Fix read-only index (set by flood_stage) curl -X PUT "localhost:9200/_all/_settings" -H 'Content-Type: application/json' -d' { "index.blocks.read_only_allow_delete": null }' ```
### 3. Fix heap memory pressure
Memory issues cause GC pressure and node instability:
```bash # Check heap usage curl -X GET "localhost:9200/_nodes/stats/jvm?pretty" | grep -A10 heap
# Check GC stats curl -X GET "localhost:9200/_nodes/stats/jvm?pretty" | grep -A20 gc
# Output: # "gc": { # "collectors": { # "young": { # "collection_count": 15000, # "collection_time_in_millis": 450000 # }, # "old": { # "collection_count": 500, # "collection_time_in_millis": 180000 # } # } # }
# High GC time indicates memory pressure # Rule of thumb: GC time > 10% of uptime = problem ```
Fix heap memory configuration:
```bash # Edit jvm.options (typically /etc/elasticsearch/jvm.options)
# Set heap size (50% of available RAM, max 31GB for compressed oops) -Xms30g -Xmx30g
# GC settings (G1GC recommended for heap > 4GB) -XX:+UseG1GC
# GC logging for analysis -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m
# Heap dump on OOM -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch
# Recommended additional settings -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=COMPAT
# After changing JVM options, restart Elasticsearch systemctl restart elasticsearch ```
Memory tuning best practices:
``` # Heap sizing guidelines: # - Never exceed 31GB (compressed oops limit) # - Set Xms = Xmx (equal min and max) # - Leave 50% of RAM for Lucene file cache # - Minimum: 2GB for development, 4GB for production
# Example configurations: # 16GB RAM server: -Xms8g -Xmx8g # 32GB RAM server: -Xms16g -Xmx16g # 64GB RAM server: -Xms31g -Xmx31g (max useful heap) # 128GB RAM server: -Xms31g -Xmx31g (additional RAM helps Lucene cache)
# Monitor memory after tuning watch 'curl -s localhost:9200/_nodes/stats/jvm | jq ".nodes[].jvm.mem.heap_used_percent"' ```
### 4. Fix shard allocation issues
Shards may be prevented from allocating:
```bash # Check allocation settings curl -X GET "localhost:9200/_cluster/settings?pretty" \ | grep -E "allocation|shard"
# Check if allocation is disabled curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&pretty" \ | grep "cluster.routing.allocation.enable"
# Possible values: # all - (default) Allow all shard allocations # primaries - Only allocate primary shards # new_primaries - Only allocate primaries for new indices # none - No shard allocation
# Check shard allocation filtering curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&pretty" \ | grep -A10 "allocation.include" ```
Enable shard allocation:
```bash # Re-enable allocation if disabled curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d' { "transient": { "cluster.routing.allocation.enable": "all" } }'
# Remove allocation filtering curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d' { "transient": { "cluster.routing.allocation.include._ip": null, "cluster.routing.allocation.exclude._name": null, "cluster.routing.allocation.require.box_type": null } }'
# Retry failed shards curl -X POST "localhost:9200/_cluster/reroute?retry_failed=true&pretty"
# Manually allocate a shard curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d' { "commands": [ { "allocate_replica": { "index": "my-index", "shard": 0, "node": "node-1" } } ] }'
# Explain why shard cannot be allocated curl -X GET "localhost:9200/_cluster/allocation/explain?pretty" -H 'Content-Type: application/json' -d' { "index": "my-index", "shard": 0, "primary": true }'
# Returns detailed explanation: # - Can be allocated to which nodes # - Why allocation is blocked # - What needs to change ```
### 5. Fix split-brain scenario
Split-brain occurs when cluster partitions into multiple masters:
```bash # Detect split-brain # Check if multiple masters exist curl -X GET "node1:9200/_cat/master?v" curl -X GET "node2:9200/_cat/master?v"
# If different masters, split-brain confirmed
# Check cluster UUID on each node curl -X GET "node1:9200/_cluster/state?pretty" | grep cluster_uuid curl -X GET "node2:9200/_cluster/state?pretty" | grep cluster_uuid
# If different UUIDs, nodes formed separate clusters ```
Prevent split-brain configuration:
```yaml # elasticsearch.yml - ALL master-eligible nodes must have:
# Same cluster name cluster.name: production-cluster
# List ALL master-eligible nodes for discovery cluster.initial_master_nodes: - node-1 - node-2 - node-3
# For production, also set: discovery.seed_hosts: - 10.0.0.1 - 10.0.0.2 - 10.0.0.3
# Minimum master nodes to prevent split-brain # Formula: (master_eligible_nodes / 2) + 1 # For 3 master-eligible nodes: (3/2) + 1 = 2 # Note: In ES 7+, this is calculated automatically cluster.no_master_block: write
# Network settings network.host: 0.0.0.0 network.publish_host: ${network.host}
# Discovery settings discovery.zen.ping.timeout: 3s discovery.zen.fd.ping_timeout: 10s ```
Recover from split-brain:
```bash # WARNING: Split-brain recovery may cause data loss # Choose the cluster with most recent/wanted data
# Step 1: Stop all data writes to both clusters
# Step 2: Identify authoritative cluster # Check which has more recent data curl -X GET "cluster1:9200/_cat/indices?v&s=index" curl -X GET "cluster2:9200/_cat/indices?v&s=index"
# Step 3: Shutdown minority cluster nodes systemctl stop elasticsearch
# Step 4: Clear cluster state on minority nodes cd /var/lib/elasticsearch rm -rf nodes/*
# Step 5: Restart nodes to rejoin primary cluster systemctl start elasticsearch
# Step 6: Verify cluster health curl -X GET "localhost:9200/_cluster/health?pretty" ```
### 6. Fix master election issues
Frequent master elections indicate instability:
```bash # Check master node status curl -X GET "localhost:9200/_cat/master?v"
# Watch for master flips watch -n 1 'curl -s localhost:9200/_cat/master?v'
# Check node eligibility curl -X GET "localhost:9200/_nodes?pretty" | grep -A5 "roles"
# Master-eligible nodes should have "master" in roles
# Check discovery configuration curl -X GET "localhost:9200/_nodes/settings?pretty" | grep -A10 "discovery" ```
Fix master election configuration:
```yaml # elasticsearch.yml for dedicated master nodes
# Node 1 (master-eligible) node.name: master-1 node.roles: ["master"] cluster.initial_master_nodes: - master-1 - master-2 - master-3
# Node 2 (master-eligible) node.name: master-2 node.roles: ["master"]
# Node 3 (master-eligible) node.name: master-3 node.roles: ["master"]
# Data nodes (NOT master-eligible) node.name: data-1 node.roles: ["data", "data_content", "data_hot", "data_warm", "data_cold", "data_frozen", "ingest"]
# Tuning discovery discovery.zen.fd.ping_interval: 1s discovery.zen.fd.ping_timeout: 30s discovery.zen.fd.ping_retries: 3 ```
Check master node health:
```bash # Monitor master node resources # High CPU or memory on master causes election issues
# Check master node heap curl -X GET "localhost:9200/_nodes/master-1/stats/jvm?pretty" | grep heap_used_percent
# Check master node load curl -X GET "localhost:9200/_nodes/master-1/stats/process?pretty" | grep load_average
# Check cluster state size (large = problem) curl -X GET "localhost:9200/_cluster/state?metric=_all&pretty" | grep cluster_state_uuid
# If cluster state is large, may indicate too many indices # Solution: Use index lifecycle management (ILM) ```
### 7. Fix corrupted index issues
Corrupted Lucene segments cause shard failures:
```bash # Check for corrupted indices curl -X POST "localhost:9200/_all/_validate/query?pretty" -H 'Content-Type: application/json' -d' { "query": { "match_all": {} } }'
# Check index segments curl -X GET "localhost:9200/my-index/_segments?pretty"
# Force merge to remove corrupted segments curl -X POST "localhost:9200/my-index/_forcemerge?only_expunge_deletes=true&max_num_segments=1" ```
Repair corrupted index:
```bash # Option 1: Close and repair index curl -X POST "localhost:9200/my-index/_close"
curl -X POST "localhost:9200/my-index/_settings" -H 'Content-Type: application/json' -d' { "index.recovery.source_type": "translog" }'
curl -X POST "localhost:9200/my-index/_open"
# Option 2: Reindex from snapshot # List available snapshots curl -X GET "localhost:9200/_snapshot/my-snapshot/_all?pretty"
# Restore from snapshot curl -X POST "localhost:9200/_snapshot/my-snapshot/snapshot-2025/_restore" -H 'Content-Type: application/json' -d' { "indices": "my-index", "ignore_unavailable": true, "include_global_state": false }'
# Option 3: Reindex from remote cluster curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d' { "source": { "remote": { "host": "http://remote-cluster:9200", "username": "user", "password": "password" }, "index": "my-index" }, "dest": { "index": "my-index-recovered" } }'
# Option 4: Last resort - delete and recreate # WARNING: Data loss! curl -X DELETE "localhost:9200/my-index"
curl -X PUT "localhost:9200/my-index" -H 'Content-Type: application/json' -d' { "settings": { "number_of_shards": 5, "number_of_replicas": 1 }, "mappings": { "properties": { "timestamp": { "type": "date" }, "message": { "type": "text" } } } }' ```
### 8. Fix version upgrade issues
Version mismatches prevent shard allocation:
```bash # Check node versions curl -X GET "localhost:9200/_nodes?pretty" | grep version
# Output: # "version": { # "distribution": "elasticsearch", # "number": "7.17.0", # "build_flavor": "default", # "build_type": "deb", # "build_hash": "abc123" # }
# Check compatibility matrix # Major version upgrades require rolling upgrade path: # 5.x -> 6.x -> 7.x -> 8.x (cannot skip major versions) ```
Safe upgrade procedure:
```bash # Step 1: Disable shard allocation before upgrade curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d' { "persistent": { "cluster.routing.allocation.enable": "none" } }'
# Step 2: Stop non-essential indexing # Flush all indices curl -X POST "localhost:9200/_flush/synced"
# Step 3: Upgrade nodes one at a time # Stop node systemctl stop elasticsearch
# Update repository and package # Debian/Ubuntu: apt-get update && apt-get install elasticsearch=8.x.x
# RHEL/CentOS: yum update elasticsearch-8.x.x
# Start node systemctl start elasticsearch
# Wait for node to rejoin curl -X GET "localhost:9200/_cat/nodes?v"
# Wait for shards to recover curl -X GET "localhost:9200/_cat/shards?v" | grep -v STARTED
# Step 4: Repeat for each node
# Step 5: Re-enable allocation after all nodes upgraded curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d' { "persistent": { "cluster.routing.allocation.enable": null } }'
# Step 6: Verify cluster health curl -X GET "localhost:9200/_cluster/health?pretty" ```
### 9. Monitor cluster health
Set up comprehensive monitoring:
```bash # Check cluster health API curl -X GET "localhost:9200/_cluster/health?pretty"
# Check node stats curl -X GET "localhost:9200/_nodes/stats?pretty"
# Check index stats curl -X GET "localhost:9200/_stats?pretty"
# Check pending tasks curl -X GET "localhost:9200/_cluster/pending_tasks?pretty" ```
Prometheus/Grafana monitoring:
```yaml # Install Elasticsearch exporter # https://github.com/prometheus-community/elasticsearch_exporter
docker run -d --name es-exporter \ -p 9114:9114 \ quay.io/prometheuscommunity/elasticsearch-exporter:latest \ --es.uri=http://localhost:9200
# Prometheus scrape config scrape_configs: - job_name: 'elasticsearch' static_configs: - targets: ['localhost:9114']
# Grafana alert rules groups: - name: elasticsearch_health rules: - alert: ElasticsearchClusterRed expr: elasticsearch_cluster_health_status{color="red"} == 1 for: 5m labels: severity: critical annotations: summary: "Elasticsearch cluster is RED" description: "Primary shards are unassigned - data loss risk"
- alert: ElasticsearchClusterYellow
- expr: elasticsearch_cluster_health_status{color="yellow"} == 1
- for: 10m
- labels:
- severity: warning
- annotations:
- summary: "Elasticsearch cluster is YELLOW"
- description: "Replica shards unassigned - reduced fault tolerance"
- alert: ElasticsearchHeapHigh
- expr: elasticsearch_jvm_memory_used_bytes{area="heap"} / elasticsearch_jvm_memory_max_bytes{area="heap"} > 0.85
- for: 10m
- labels:
- severity: warning
- annotations:
- summary: "Elasticsearch heap usage high"
- description: "Node {{ $labels.node }} heap usage is {{ $value | humanizePercentage }}"
- alert: ElasticsearchDiskHigh
- expr: elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes < 0.15
- for: 1h
- labels:
- severity: warning
- annotations:
- summary: "Elasticsearch disk usage high"
- description: "Node {{ $labels.node }} disk available is {{ $value | humanizePercentage }}"
`
Index Lifecycle Management (ILM):
```json // Create ILM policy PUT _ilm/policy/logs-policy { "policy": { "phases": { "hot": { "min_age": "0ms", "actions": { "rollover": { "max_size": "50gb", "max_age": "7d" }, "set_priority": { "priority": 100 } } }, "warm": { "min_age": "7d", "actions": { "shrink": { "number_of_shards": 1 }, "forcemerge": { "max_num_segments": 1 }, "set_priority": { "priority": 50 } } }, "cold": { "min_age": "30d", "actions": { "set_priority": { "priority": 0 } } }, "delete": { "min_age": "90d", "actions": { "delete": {} } } } } }
// Apply ILM policy to index template PUT _index_template/logs-template { "index_patterns": ["logs-*"], "template": { "settings": { "index.lifecycle.name": "logs-policy", "index.lifecycle.rollover_alias": "logs" } } }
// Create initial index with rollover PUT logs-000001 { "aliases": { "logs": { "is_write_index": true } } } ```
Prevention
- Monitor cluster health with automated alerting (Prometheus/Grafana)
- Set appropriate disk watermarks for your data growth rate
- Size heap correctly (50% RAM, max 31GB)
- Use dedicated master nodes for production clusters (3 nodes minimum)
- Configure index lifecycle management (ILM) for data retention
- Enable shard allocation awareness for multi-zone deployments
- Regular snapshots for disaster recovery
- Test upgrade procedures in staging before production
- Use enough replicas for fault tolerance (1 replica = 1 node failure tolerance)
- Monitor GC pressure and heap usage trends
Related Errors
- **ClusterBlockException**: Cluster or index blocked due to disk pressure
- **MasterNotDiscoveredException**: Cannot connect to master node
- **UnavailableShardsException**: Shards not available for query
- **TooManyBucketsException**: Aggregation exceeds memory limit
- **CircuitBreakingException**: Memory circuit breaker triggered