Introduction

Elasticsearch shard allocation failures occur when the cluster cannot assign primary or replica shards to available nodes, resulting in UNASSIGNED shards and degraded cluster health (yellow or red status). This prevents data from being accessible and reduces cluster resilience. Common causes include disk watermark thresholds exceeded (node has insufficient free space), node leaving cluster during shard allocation, network partition isolating nodes, shard allocation filtering rules blocking assignment, awareness attributes misconfigured (wrong zone/rack values), version mismatch preventing allocation (newer shard to older node), corrupted shard data preventing allocation, license restrictions limiting shard count, cluster recovery conflicts after restart, insufficient nodes for replica allocation, and allocation throttling during rebalancing. The fix requires using the allocation explain API to diagnose why shards cannot be allocated, then addressing the specific decider blocking allocation (disk, awareness, filtering, etc.). This guide provides production-proven troubleshooting for shard allocation failures across Elasticsearch versions 6.x through 8.x and managed services (Elastic Cloud, AWS OpenSearch).

Symptoms

  • Cluster health yellow (replica shards unassigned) or red (primary shards unassigned)
  • _cat/shards shows UNASSIGNED shards
  • _cluster/allocation/explain returns allocation decider failures
  • Nodes online but shards not allocating
  • cannot allocate because allocation is not permitted to any of the nodes
  • Disk usage warnings in cluster health response
  • Shard stuck in INITIALIZING state indefinitely
  • Cluster status degraded after node failure or restart
  • New indices created but shards remain unassigned
  • Snapshot restore fails with unassigned shards

Common Causes

  • Disk usage exceeds high watermark (90% default) - disk_threshold decider
  • Not enough nodes to allocate replicas (single node cluster)
  • Shard allocation filtering (cluster.routing.allocation.* settings)
  • Awareness attributes mismatch (zone, rack attributes)
  • Node has too many shards (node_concurrent_recoveries limit)
  • Version incompatibility (cannot allocate to older version node)
  • Corrupted shard data on all potential target nodes
  • Cluster in read-only mode (index blocks)
  • Insufficient heap memory on target nodes
  • Recovery throttling (cluster_concurrent_recoveries limit)
  • Custom allocation deciders blocking assignment
  • Index settings prevent allocation (require/_exclude rules)

Step-by-Step Fix

### 1. Diagnose shard allocation status

Check cluster health and shard status:

```bash # Check cluster health curl -X GET "localhost:9200/_cluster/health?pretty"

# Output shows: # { # "cluster_name": "production-cluster", # "status": "yellow", # or red # "timed_out": false, # "number_of_nodes": 3, # "number_of_data_nodes": 2, # "active_primary_shards": 50, # "active_shards": 95, # "relocating_shards": 0, # "initializing_shards": 0, # "unassigned_shards": 5, # These are the problem # "delayed_unassigned_shards": 0, # "number_of_pending_tasks": 0, # "number_of_in_flight_fetch": 0, # "task_max_waiting_in_queue_millis": 0, # "active_shards_percent_as_number": 95.0 # }

# Check which shards are unassigned curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,node,unassigned.reason"

# Output shows: # index shard prirep state node unassigned.reason # logs-2024.01 0 p STARTED node-1 # logs-2024.01 0 r UNASSIGNED CLUSTER_RECOVERED # logs-2024.01 1 p STARTED node-2 # logs-2024.01 1 r UNASSIGNED ALLOCATION_FAILED # metrics 0 p UNASSIGNED NODE_LEFT # metrics 0 r UNASSIGNED NODE_LEFT

# Key unassigned.reason values: # - CLUSTER_RECOVERED: Shard unassigned after cluster recovery # - INDEX_CREATED: New index, shard not yet allocated # - NODE_LEFT: Node holding shard left cluster # - ALLOCATION_FAILED: Previous allocation attempt failed # - REPLICA_ADDED: Replica added, waiting for allocation # - REINITIALIZED: Shard being reinitialized ```

Use allocation explain API:

```bash # Get detailed explanation for why shard cannot be allocated curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"

# If multiple unassigned shards, specify which one: curl -X GET "localhost:9200/_cluster/allocation/explain?pretty" -H 'Content-Type: application/json' -d ' { "index": "logs-2024.01", "shard": 0, "primary": false # false for replica, true for primary }'

# Output shows detailed allocation explanation: # { # "index": "logs-2024.01", # "shard": 0, # "primary": false, # "current_state": "unassigned", # "unassigned_info": { # "reason": "ALLOCATION_FAILED", # "at": "2024-01-15T10:00:00.000Z", # "failed_allocation_attempts": 5, # "details": "failed to allocate shard" # }, # "can_allocate": "no", # "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes", # "node_allocation_decisions": [ # { # "node_id": "abc123", # "node_name": "node-1", # "transport_address": "10.0.1.1:9300", # "node_decision": "no", # "node_decision_explanation": "this node cannot allocate the shard", # "deciders": [ # { # "decider": "disk_threshold", # "decision": "NO", # "explanation": "the node is above the high watermark cluster setting [cluster.routing.allocation.disk.watermark.high=90%], having less than the minimum required [9.5gb] free space, actual free: [6gb]" # }, # { # "decider": "same_shard", # "decision": "YES", # "explanation": "a copy of this shard is already allocated on this node" # } # ] # }, # { # "node_id": "def456", # "node_name": "node-2", # "transport_address": "10.0.1.2:9300", # "node_decision": "no", # "deciders": [ # { # "decider": "disk_threshold", # "decision": "NO", # "explanation": "the node is above the high watermark cluster setting..." # } # ] # } # ] # }

# The "deciders" array shows exactly WHY allocation failed # Focus on deciders with "decision": "NO" ```

### 2. Fix disk watermark allocation failures

Understand disk watermarks:

```bash # Check current watermark settings curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&pretty" \ | grep -A2 "watermark"

# Default watermarks: # - low: 85% - ES won't allocate NEW shards to node # - high: 90% - ES relocates shards AWAY from node # - flood_stage: 95% - ES sets indices to read-only

# Check disk usage per node curl -X GET "localhost:9200/_cat/allocation?v"

# Output: # shards disk.indices disk.used disk.avail disk.total disk.percent host ip node # 121 45gb 89gb 6gb 95gb 94 192.168.1.1 10.0.1.1 node-1 # 98 42gb 88gb 7gb 95gb 93 192.168.1.2 10.0.1.2 node-2

# If disk.percent > 90%, disk_threshold decider blocks allocation ```

Fix disk watermark issues:

```bash # Option 1: Free disk space (recommended) # Delete old indices curl -X DELETE "localhost:9200/logs-2023.*?pretty"

# Delete documents by query curl -X POST "localhost:9200/logs-*/_delete_by_query?pretty" -H 'Content-Type: application/json' -d ' { "query": { "range": { "@timestamp": { "lte": "now-90d" } } } }'

# Force merge to free space curl -X POST "localhost:9200/_all/_forcemerge?max_num_segments=1&only_expunge_deletes=true"

# Option 2: Temporarily increase watermark threshold curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d ' { "persistent": { "cluster.routing.allocation.disk.watermark.high": "95%" } }'

# Option 3: Use absolute free space instead of percentage curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d ' { "persistent": { "cluster.routing.allocation.disk.watermark.low.freespace": "50gb", "cluster.routing.allocation.disk.watermark.high.freespace": "30gb", "cluster.routing.allocation.disk.watermark.flood_stage.freespace": "10gb" } }'

# Option 4: Disable disk-based allocation (NOT recommended for production) curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d ' { "persistent": { "cluster.routing.allocation.disk.threshold_enabled": false } }'

# After freeing space or adjusting settings, retry allocation curl -X POST "localhost:9200/_cluster/reroute?retry_failed=true&pretty" ```

### 3. Fix allocation filtering and rules

Check allocation filtering settings:

```bash # Check current allocation settings curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&pretty" \ | grep -A5 "allocation"

# Common filtering settings: # cluster.routing.allocation.enable: all/none/primaries/new_primaries # cluster.routing.allocation.include._name: node patterns # cluster.routing.allocation.exclude._name: node patterns # cluster.routing.allocation.require._name: node patterns

# Check for active filtering curl -X GET "localhost:9200/_cluster/settings?pretty"

# Output might show: # { # "persistent": { # "cluster.routing.allocation.enable": "none", # ALL allocation disabled! # "cluster.routing.allocation.exclude._ip": "10.0.1.5" # } # }

# Fix: Re-enable allocation curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d ' { "persistent": { "cluster.routing.allocation.enable": "all" } }'

# Remove exclusion rules if no longer needed curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d ' { "persistent": { "cluster.routing.allocation.exclude._ip": null } }' ```

Check index-level allocation rules:

```bash # Check index settings for allocation rules curl -X GET "localhost:9200/my-index/_settings?pretty" | grep -A10 "routing"

# Index settings might include: # "index.routing.allocation.include._name": "hot-*", # "index.routing.allocation.exclude._name": "cold-*", # "index.routing.allocation.require.box_type": "hot"

# These settings restrict where shards can allocate

# Fix: Update or remove allocation rules curl -X PUT "localhost:9200/my-index/_settings?pretty" -H 'Content-Type: application/json' -d ' { "index.routing.allocation.require.box_type": null }'

# Or update to different value curl -X PUT "localhost:9200/my-index/_settings?pretty" -H 'Content-Type: application/json' -d ' { "index.routing.allocation.require.box_type": "warm" }'

# Check node attributes for awareness curl -X GET "localhost:9200/_cat/nodes?v&h=name,ip,node.attr.*"

# Output: # name ip node.attr.box_type node.attr.zone # node-1 10.0.1.1 hot us-east-1a # node-2 10.0.1.2 warm us-east-1b # node-3 10.0.1.3 cold us-east-1c

# If index requires box_type=hot but no hot nodes available, allocation fails ```

### 4. Fix awareness attribute issues

Configure shard awareness:

```bash # Check cluster awareness settings curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&pretty" \ | grep -A5 "awareness"

# Typical awareness configuration: # cluster.routing.allocation.awareness.attributes: zone,rack # cluster.routing.allocation.awareness.force.zone.values: us-east-1a,us-east-1b

# Awareness ensures replicas are in different zones/racks # Problem: If not enough zones, allocation fails

# Example: 2 nodes in same zone, awareness requires different zones # Replica cannot allocate

# Fix Option 1: Add nodes in different zones # Fix Option 2: Reduce awareness requirements

# Temporarily disable awareness for recovery curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d ' { "persistent": { "cluster.routing.allocation.awareness.attributes": "" } }'

# Or update to fewer attributes curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d ' { "persistent": { "cluster.routing.allocation.awareness.attributes": "zone" # Removed "rack" from awareness } }'

# Force allocation ignoring awareness (emergency only) curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d ' { "transient": { "cluster.routing.allocation.awareness.force.zone.values": "" } }' ```

### 5. Fix node capacity issues

Check node shard limits:

```bash # Check current node limits curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&pretty" \ | grep -E "max_shards_per_node|concurrent"

# Default limits: # cluster.max_shards_per_node: 1000 # cluster.routing.allocation.node_concurrent_recoveries: 2 # cluster.routing.allocation.cluster_concurrent_recoveries: 2

# Check shard count per node curl -X GET "localhost:9200/_cat/shards?v" | awk '{print $4}' | sort | uniq -c

# If node has too many shards, allocation may fail

# Fix: Increase shard limit curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d ' { "persistent": { "cluster.max_shards_per_node": 2000 } }'

# Increase concurrent recovery speed curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d ' { "persistent": { "cluster.routing.allocation.node_concurrent_recoveries": 4, "cluster.routing.allocation.cluster_concurrent_recoveries": 4 } }'

# Note: Higher concurrent recoveries = faster recovery but more load ```

Check node heap and resources:

```bash # Check node stats curl -X GET "localhost:9200/_cat/nodes?v&h=name,heap.percent,ram.percent,cpu,load_1m"

# Output: # name heap.percent ram.percent cpu load_1m # node-1 85 92 50 2.50 # High load # node-2 45 60 10 0.50 # Normal

# If heap.percent consistently > 85%, node may reject allocations

# Check circuit breaker status curl -X GET "localhost:9200/_nodes/stats/breaker?pretty"

# Look for "tripped" count # If high, circuit breakers are preventing allocation ```

### 6. Force shard allocation (advanced)

Manual shard rerouting:

```bash # Force allocate a replica shard to specific node curl -X POST "localhost:9200/_cluster/reroute?pretty" -H 'Content-Type: application/json' -d ' { "commands": [{ "allocate_replica": { "index": "logs-2024.01", "shard": 0, "node": "node-1" } }] }'

# Force allocate primary shard (DANGEROUS - may cause data loss) # Only use when primary is lost and no replicas exist curl -X POST "localhost:9200/_cluster/reroute?pretty" -H 'Content-Type: application/json' -d ' { "commands": [{ "allocate_stale_primary": { "index": "metrics", "shard": 0, "node": "node-1", "accept_data_loss": true } }] }'

# Cancel relocating shard (stop ongoing relocation) curl -X POST "localhost:9200/_cluster/reroute?pretty" -H 'Content-Type: application/json' -d ' { "commands": [{ "cancel": { "index": "logs-2024.01", "shard": 0, "node": "node-1" } }] }'

# Move shard from one node to another curl -X POST "localhost:9200/_cluster/reroute?pretty" -H 'Content-Type: application/json' -d ' { "commands": [{ "move": { "index": "logs-2024.01", "shard": 0, "from_node": "node-1", "to_node": "node-2" } }] }'

# Retry all failed allocations curl -X POST "localhost:9200/_cluster/reroute?retry_failed=true&pretty" ```

### 7. Debug allocation with profiling

Enable allocation debugging:

```bash # Enable cluster allocation explain logging curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d ' { "persistent": { "logger.org.elasticsearch.cluster.routing.allocation": "DEBUG" } }'

# Check logs for allocation decisions tail -f /var/log/elasticsearch/elasticsearch.log | grep -i "allocation\|decider"

# Revert to INFO after debugging curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d ' { "persistent": { "logger.org.elasticsearch.cluster.routing.allocation": "INFO" } }'

# Simulate allocation without actually allocating curl -X GET "localhost:9200/_cluster/allocation/explain?pretty" -H 'Content-Type: application/json' -d ' { "index": "logs-2024.01", "shard": 0, "primary": false, "current_node": "node-1" # Simulate from this node perspective }' ```

Prevention

  • Monitor disk usage with alerts at 70%, 80%, 85% thresholds
  • Use ILM policies to manage index lifecycle and shard count
  • Configure appropriate shard sizes (10-50GB typical)
  • Set max_shards_per_node based on cluster capacity
  • Use hot-warm-cold architecture with proper allocation rules
  • Monitor allocation decider metrics for early warnings
  • Test allocation behavior during capacity planning
  • Document runbook for shard allocation troubleshooting
  • Implement synthetic allocation tests in staging
  • Consider managed Elasticsearch for automatic shard management
  • **Yellow cluster status**: Replica shards unassigned
  • **Red cluster status**: Primary shards unassigned
  • **Cluster read-only**: Flood stage watermark exceeded
  • **Circuit breaker tripped**: Memory limit exceeded
  • **Cluster state too large**: Too many shards/indices