What's Actually Happening

Elasticsearch cluster shows yellow status, indicating some replica shards are not allocated. Primary shards are healthy, but data redundancy is reduced, creating risk if nodes fail.

The Error You'll See

Cluster health yellow:

```bash $ curl -X GET "localhost:9200/_cluster/health?pretty"

{ "cluster_name" : "my-cluster", "status" : "yellow", "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 10, "active_shards" : 10, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 10 } ```

Unassigned shards:

```bash $ curl -X GET "localhost:9200/_cat/shards?v&s=state"

index shard prirep state docs store ip node my-index 0 p STARTED 1000 5mb 10.0.0.1 node-1 my-index 0 r UNASSIGNED ```

Why This Happens

  1. 1.Single node cluster - No other nodes for replicas
  2. 2.Node failure - Node holding replica went down
  3. 3.Disk space low - Not enough space for replica allocation
  4. 4.Allocation disabled - Cluster routing allocation disabled
  5. 5.Node attributes mismatch - Replica requires specific attributes
  6. 6.Shard limit reached - Too many shards per node

Step 1: Check Cluster Health

```bash # Get cluster health curl -X GET "localhost:9200/_cluster/health?pretty"

# Get detailed health curl -X GET "localhost:9200/_cluster/health?level=shards&pretty"

# Check cluster state curl -X GET "localhost:9200/_cluster/state?pretty"

# List unassigned shards curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason" | grep UNASSIGNED

# Check allocation explain curl -X GET "localhost:9200/_cluster/allocation/explain?pretty" -d '{ "index": "my-index", "shard": 0, "primary": false }' ```

Step 2: Check Node Count

```bash # List nodes curl -X GET "localhost:9200/_cat/nodes?v"

# Check node count curl -X GET "localhost:9200/_cat/nodes?v" | wc -l

# For replicas, need at least 2 nodes # Single node cluster will always be yellow if replicas configured

# Check node roles curl -X GET "localhost:9200/_cat/nodes?v&h=name,role"

# Check node attributes curl -X GET "localhost:9200/_cat/nodeattrs?v" ```

Step 3: Fix Single Node Cluster

```bash # Option 1: Set replicas to 0 for single node curl -X PUT "localhost:9200/my-index/_settings" -d '{ "number_of_replicas": 0 }'

# For all indices: curl -X PUT "localhost:9200/_all/_settings" -d '{ "number_of_replicas": 0 }'

# Set in index template: curl -X PUT "localhost:9200/_template/default" -d '{ "index_patterns": ["*"], "settings": { "number_of_replicas": 0 } }'

# Option 2: Add more nodes to cluster # Start additional nodes with same cluster name ```

Step 4: Check Disk Space

```bash # Check disk allocation curl -X GET "localhost:9200/_cat/allocation?v"

# Check node stats curl -X GET "localhost:9200/_nodes/stats/fs?pretty"

# Check disk watermark settings curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&flat_settings=true" | grep disk

# Default watermarks: # cluster.routing.allocation.disk.watermark.low: 85% # cluster.routing.allocation.disk.watermark.high: 90% # cluster.routing.allocation.disk.watermark.flood_stage: 95%

# If disk > 85%, new replicas won't be allocated

# Update watermarks if needed: curl -X PUT "localhost:9200/_cluster/settings" -d '{ "transient": { "cluster.routing.allocation.disk.watermark.low": "90%", "cluster.routing.allocation.disk.watermark.high": "95%", "cluster.routing.allocation.disk.watermark.flood_stage": "98%" } }'

# Free up disk space: # - Delete old indices # - Delete snapshots # - Clear caches ```

Step 5: Check Allocation Settings

```bash # Check allocation settings curl -X GET "localhost:9200/_cluster/settings?pretty"

# Check if allocation disabled: curl -X GET "localhost:9200/_cluster/settings" | grep allocation

# If "cluster.routing.allocation.enable": "none", re-enable: curl -X PUT "localhost:9200/_cluster/settings" -d '{ "transient": { "cluster.routing.allocation.enable": "all" } }'

# Values: all, primaries, new_primaries, none

# Check allocation awareness: curl -X GET "localhost:9200/_cluster/settings" | grep awareness

# If allocation awareness configured, ensure nodes have required attributes ```

Step 6: Check Shard Limit

```bash # Check shards per node curl -X GET "localhost:9200/_cat/shards?v" | wc -l

# Check shard limit setting curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&flat_settings=true" | grep max_shards

# Default: 1000 shards per node

# Increase limit if needed: curl -X PUT "localhost:9200/_cluster/settings" -d '{ "transient": { "cluster.max_shards_per_node": 2000 } }'

# Or in elasticsearch.yml: cluster.max_shards_per_node: 2000 ```

Step 7: Manually Allocate Shards

```bash # If automatic allocation fails, manually allocate:

# Find unassigned shard details curl -X GET "localhost:9200/_cluster/allocation/explain?pretty" -d '{ "index": "my-index", "shard": 0, "primary": false }'

# Manually allocate replica to specific node: curl -X POST "localhost:9200/_cluster/reroute?pretty" -d '{ "commands": [ { "allocate_replica": { "index": "my-index", "shard": 0, "node": "node-2" } } ] }'

# If primary shard unassigned (use with caution): curl -X POST "localhost:9200/_cluster/reroute?pretty" -d '{ "commands": [ { "allocate_stale_primary": { "index": "my-index", "shard": 0, "node": "node-1", "accept_data_loss": true } } ] }' ```

Step 8: Check Node Attributes

```bash # Check if allocation requires specific attributes curl -X GET "localhost:9200/_cluster/settings" | grep require

# Example: replica requires "hot" tier # "index.routing.allocation.require._tier": "hot"

# Check node tiers curl -X GET "localhost:9200/_cat/nodeattrs?v" | grep tier

# If nodes don't have required attributes: # 1. Add attributes to nodes # 2. Or remove allocation requirements

# Remove allocation requirement: curl -X PUT "localhost:9200/my-index/_settings" -d '{ "index.routing.allocation.require._tier": null }'

# Add attribute to node in elasticsearch.yml: node.attr._tier: hot ```

Step 9: Fix Failed Node Scenario

```bash # If node failed and left unassigned replicas:

# Check for missing nodes curl -X GET "localhost:9200/_cat/nodes?v"

# If node removed permanently, allocate replicas to remaining nodes curl -X POST "localhost:9200/_cluster/reroute?pretty" -d '{ "commands": [ { "allocate_replica": { "index": "my-index", "shard": 0, "node": "remaining-node" } } ] }'

# If node rejoins, replicas may auto-allocate # Check cluster reroute commands: curl -X GET "localhost:9200/_cluster/reroute/explain?pretty" ```

Step 10: Monitor Cluster Health

```bash # Create monitoring script cat << 'EOF' > monitor_es_cluster.sh #!/bin/bash

echo "=== Cluster Health ===" curl -s "localhost:9200/_cluster/health?pretty"

echo "" echo "=== Unassigned Shards ===" curl -s "localhost:9200/_cat/shards?v" | grep UNASSIGNED

echo "" echo "=== Node Allocation ===" curl -s "localhost:9200/_cat/allocation?v"

echo "" echo "=== Allocation Explain ===" curl -s -X GET "localhost:9200/_cluster/allocation/explain?pretty" -d '{"index": null}' EOF

chmod +x monitor_es_cluster.sh

# Set up Prometheus metrics: # Key metrics: # - elasticsearch_cluster_health_status # - elasticsearch_cluster_health_number_of_unassigned_shards # - elasticsearch_cluster_health_active_shards

# Alert for yellow status: - alert: ElasticsearchYellowCluster expr: elasticsearch_cluster_health_status{color="yellow"} == 1 for: 5m labels: severity: warning annotations: summary: "Elasticsearch cluster yellow - unassigned replicas" ```

Elasticsearch Yellow Status Checklist

CheckCommandExpected
Cluster status_cluster/healthgreen
Unassigned shards_cat/shards0
Node count_cat/nodes>= 2 for replicas
Disk space_cat/allocation< 85%
Allocation enabled_cluster/settings"all"
Replicas countindex settings0 for single node

Verify the Fix

```bash # After fixing yellow status

# 1. Check cluster health curl -X GET "localhost:9200/_cluster/health?pretty" // "status" : "green"

# 2. Verify no unassigned shards curl -X GET "localhost:9200/_cat/shards?v" | grep UNASSIGNED // Empty result

# 3. Check allocation curl -X GET "localhost:9200/_cat/allocation?v" // All shards allocated

# 4. Verify replica count curl -X GET "localhost:9200/my-index/_settings" | jq '.["my-index"].settings.index.number_of_replicas' // Appropriate for node count

# 5. Test cluster resilience # Stop one node (if multi-node) curl -X GET "localhost:9200/_cluster/health?pretty" // Should stay green or recover quickly

# 6. Check disk usage curl -X GET "localhost:9200/_cat/allocation?v" // Below watermark thresholds ```

  • [Fix Elasticsearch Cluster Red Status](/articles/fix-elasticsearch-cluster-red-status)
  • [Fix Elasticsearch Shard Allocation Failed](/articles/fix-elasticsearch-shard-allocation-failed)
  • [Fix Elasticsearch Query Taking Too Long](/articles/fix-elasticsearch-query-taking-too-long)