Introduction Elasticsearch shards in the UNASSIGNED state cannot serve queries or accept writes. When shards remain stuck in this state after a node failure, restart, or cluster expansion, data becomes partially unavailable and the cluster health degrades to yellow or red.
Symptoms - `GET /_cluster/health` shows `"status": "yellow"` or `"red"` - `GET /_cat/shards?v` shows multiple shards with `UNASSIGNED` state - `GET /_cluster/allocation/explain` returns detailed reason for each unassigned shard - Index search results are incomplete (missing documents from unassigned shards) - Write operations to affected indices fail with `UnavailableShardsException`
Common Causes - Disk watermark exceeded (less than 15% free disk space) - Too many shards for the number of available nodes - `cluster.routing.allocation.enable` set to `none` - Node with shard data permanently lost (disk failure) - Allocation decider throttling due to rapid node restarts
Step-by-Step Fix 1. **Explain why shards are unassigned": ```bash curl -s localhost:9200/_cluster/allocation/explain?pretty # Output shows: # "explanation": "the node was not eligible because the node had too many shards" # or # "explanation": "cannot allocate because all found nodes are over the high watermark" ```
- 1.**Check disk watermarks and free space":
- 2.```bash
- 3.curl -s localhost:9200/_cat/allocation?v
- 4.# Shows disk usage per node
# If disk is the issue, free space or adjust watermarks curl -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{ "persistent": { "cluster.routing.allocation.disk.watermark.low": "90%", "cluster.routing.allocation.disk.watermark.high": "95%", "cluster.routing.allocation.disk.watermark.flood_stage": "97%" } }' ```
- 1.**Enable shard allocation if disabled":
- 2.```bash
- 3.curl -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
- 4."persistent": {
- 5."cluster.routing.allocation.enable": "all"
- 6.}
- 7.}'
- 8.
` - 9.**Force assign a replica shard":
- 10.```bash
- 11.# Find the node name
- 12.curl -s localhost:9200/_cat/nodes?v
# Retry allocation for a specific shard curl -X POST localhost:9200/_cluster/reroute -H 'Content-Type: application/json' -d '{ "commands": [{ "allocate_replica": { "index": "my_index", "shard": 0, "node": "node-2" } }] }' ```
- 1.**Force assign a primary shard (data loss risk)":
- 2.```bash
- 3.# Only use when the original data is permanently lost
- 4.curl -X POST localhost:9200/_cluster/reroute -H 'Content-Type: application/json' -d '{
- 5."commands": [{
- 6."allocate_empty_primary": {
- 7."index": "my_index",
- 8."shard": 0,
- 9."node": "node-1",
- 10."accept_data_loss": true
- 11.}
- 12.}]
- 13.}'
- 14.
`