Introduction Elasticsearch shards in the UNASSIGNED state cannot serve queries or accept writes. When shards remain stuck in this state after a node failure, restart, or cluster expansion, data becomes partially unavailable and the cluster health degrades to yellow or red.

Symptoms - `GET /_cluster/health` shows `"status": "yellow"` or `"red"` - `GET /_cat/shards?v` shows multiple shards with `UNASSIGNED` state - `GET /_cluster/allocation/explain` returns detailed reason for each unassigned shard - Index search results are incomplete (missing documents from unassigned shards) - Write operations to affected indices fail with `UnavailableShardsException`

Common Causes - Disk watermark exceeded (less than 15% free disk space) - Too many shards for the number of available nodes - `cluster.routing.allocation.enable` set to `none` - Node with shard data permanently lost (disk failure) - Allocation decider throttling due to rapid node restarts

Step-by-Step Fix 1. **Explain why shards are unassigned": ```bash curl -s localhost:9200/_cluster/allocation/explain?pretty # Output shows: # "explanation": "the node was not eligible because the node had too many shards" # or # "explanation": "cannot allocate because all found nodes are over the high watermark" ```

  1. 1.**Check disk watermarks and free space":
  2. 2.```bash
  3. 3.curl -s localhost:9200/_cat/allocation?v
  4. 4.# Shows disk usage per node

# If disk is the issue, free space or adjust watermarks curl -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{ "persistent": { "cluster.routing.allocation.disk.watermark.low": "90%", "cluster.routing.allocation.disk.watermark.high": "95%", "cluster.routing.allocation.disk.watermark.flood_stage": "97%" } }' ```

  1. 1.**Enable shard allocation if disabled":
  2. 2.```bash
  3. 3.curl -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
  4. 4."persistent": {
  5. 5."cluster.routing.allocation.enable": "all"
  6. 6.}
  7. 7.}'
  8. 8.`
  9. 9.**Force assign a replica shard":
  10. 10.```bash
  11. 11.# Find the node name
  12. 12.curl -s localhost:9200/_cat/nodes?v

# Retry allocation for a specific shard curl -X POST localhost:9200/_cluster/reroute -H 'Content-Type: application/json' -d '{ "commands": [{ "allocate_replica": { "index": "my_index", "shard": 0, "node": "node-2" } }] }' ```

  1. 1.**Force assign a primary shard (data loss risk)":
  2. 2.```bash
  3. 3.# Only use when the original data is permanently lost
  4. 4.curl -X POST localhost:9200/_cluster/reroute -H 'Content-Type: application/json' -d '{
  5. 5."commands": [{
  6. 6."allocate_empty_primary": {
  7. 7."index": "my_index",
  8. 8."shard": 0,
  9. 9."node": "node-1",
  10. 10."accept_data_loss": true
  11. 11.}
  12. 12.}]
  13. 13.}'
  14. 14.`

Prevention - Monitor disk usage per node and alert at 70% utilization - Keep the number of shards per node under 20 per GB of heap - Use Index Lifecycle Management (ILM) to automatically roll over and shrink indices - Set appropriate replica counts based on the number of data nodes - Test node failure scenarios regularly to validate recovery procedures - Use dedicated master nodes to prevent split-brain and allocation issues - Monitor `_cat/shards` for UNASSIGNED shards in automated health checks