# How to Fix Elasticsearch Unassigned Shards
You've discovered unassigned shards in your Elasticsearch cluster. Left unresolved, these can lead to reduced redundancy or even data loss. Let's walk through diagnosing and fixing this common issue.
Detecting Unassigned Shards
Start by checking your cluster health:
curl -X GET "localhost:9200/_cluster/health?pretty"A response with unassigned_shards > 0 indicates a problem:
{
"cluster_name" : "production-cluster",
"status" : "yellow",
"unassigned_shards" : 5,
"delayed_unassigned_shards" : 0
}List all unassigned shards with details:
curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason,node&s=state"You'll see output like:
index shard prirep state unassigned.reason node
logs-2024-01 0 r UNASSIGNED ALLOCATION_FAILED
products 2 r UNASSIGNED NODE_LEFT
analytics 1 r UNASSIGNED CLUSTER_RECOVEREDUnderstanding Unassigned Reasons
The unassigned.reason field tells you why the shard became unassigned:
| Reason | Description |
|---|---|
| INDEX_CREATED | New index created, shards not yet assigned |
| CLUSTER_RECOVERED | Full cluster recovery after restart |
| INDEX_REOPENED | Closed index was reopened |
| DANGLING_INDEX_IMPORTED | Index imported from another cluster |
| NEW_INDEX_RESTORED | Restored from snapshot |
| EXISTING_INDEX_RESTORED | Restored over existing index |
| REPLICA_ADDED | Replica count increased |
| ALLOCATION_FAILED | Allocation attempts failed |
| NODE_LEFT | Node containing shard left cluster |
| REROUTE_CANCELLED | Explicit reroute cancellation |
| REINITIALIZED | Primary shard reinitialized |
| REALLOCATED_REPLICA | Replica relocated for rebalancing |
Deep Diagnosis with Allocation Explain
For detailed diagnosis, use the allocation explain API:
curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"For a specific shard:
curl -X GET "localhost:9200/_cluster/allocation/explain?pretty" -H 'Content-Type: application/json' -d'
{
"index": "logs-2024-01",
"shard": 0,
"primary": false
}
'The response provides comprehensive information:
{
"index" : "logs-2024-01",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2024-01-15T10:30:00.000Z",
"failed_attempts" : 5,
"delayed" : false,
"details" : "failed to create shard, failure IOException[disk space insufficient]",
"last_allocation_status" : "deciders_no"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "node-1",
"node_name" : "node-1",
"transport_address" : "10.0.0.1:9300",
"node_decision" : "no",
"deciders" : [
{
"decider" : "disk_threshold",
"decision" : "NO",
"explanation" : "the node has insufficient disk space"
}
]
}
]
}Solution 1: Fix Disk Space Issues
If disk space is the culprit, check allocation across nodes:
curl -X GET "localhost:9200/_cat/allocation?v&h=shards,disk.indices,disk.used,disk.avail,disk.total,disk.percent"shards disk.indices disk.used disk.avail disk.total disk.percent
15 25.2gb 85.1gb 14.9gb 100.0gb 85
12 20.5gb 92.3gb 7.7gb 100.0gb 92Nodes above 85% trigger the low watermark, preventing new shard allocation. Above 90% triggers the high watermark, attempting to relocate shards.
Free up space by deleting old indices:
curl -X DELETE "localhost:9200/logs-2023-*"Or adjust watermark thresholds temporarily:
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "90%",
"cluster.routing.allocation.disk.watermark.high": "95%"
}
}
'Solution 2: Re-enable Allocation
If shard allocation was disabled:
curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&flat_settings=true&pretty" | grep allocation.enabledRe-enable it:
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"transient": {
"cluster.routing.allocation.enable": "all"
}
}
'The enable options are:
- all: Allow all shard allocation (default)
- primaries: Allow primary shard allocation only
- new_primaries: Allow new primary shard allocation only
- none: No shard allocation allowed
Solution 3: Handle NODE_LEFT Scenarios
When a node leaves, shards may remain unassigned if no replacement exists. First, check if the node will return:
curl -X GET "localhost:9200/_cat/nodes?v"If the node is permanently removed, you need to handle orphaned replicas:
For replica shards, simply let Elasticsearch reallocate:
curl -X POST "localhost:9200/_cluster/reroute?retry_failed=true"For primary shards from a departed node, you may need to allocate a stale primary:
curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
"commands": [
{
"allocate_stale_primary": {
"index": "critical-data",
"shard": 0,
"node": "node-2",
"accept_data_loss": true
}
}
]
}
'Solution 4: Reset Failed Allocation Attempts
When allocation fails repeatedly, Elasticsearch stops trying. Reset and retry:
curl -X POST "localhost:9200/_cluster/reroute?retry_failed=true&pretty"For specific shards:
curl -X GET "localhost:9200/_cluster/allocation/explain?include_yes_decisions=true" -H 'Content-Type: application/json' -d'
{
"index": "logs-2024-01",
"shard": 0,
"primary": false
}
'Solution 5: Manual Shard Allocation
When automatic allocation fails, manually assign shards:
For replica shards:
curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
"commands": [
{
"allocate_replica": {
"index": "logs-2024-01",
"shard": 0,
"node": "node-2"
}
}
]
}
'For primary shards (last resort):
curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
"commands": [
{
"allocate_empty_primary": {
"index": "test-index",
"shard": 0,
"node": "node-1",
"accept_data_loss": true
}
}
]
}
'Use allocate_empty_primary only when you accept complete data loss for that shard.
Solution 6: Fix Attribute-Based Allocation
If you use attribute-based allocation, verify nodes have correct attributes:
curl -X GET "localhost:9200/_cat/nodeattrs?v"Check allocation filtering settings:
curl -X GET "localhost:9200/index-name/_settings?pretty" | grep allocationAdjust if necessary:
curl -X PUT "localhost:9200/index-name/_settings" -H 'Content-Type: application/json' -d'
{
"index.routing.allocation.include._tier_preference": "data_hot"
}
'Solution 7: Handle Frozen Indices
Frozen indices may have allocation issues:
curl -X POST "localhost:9200/index-name/_unfreeze"Then trigger reallocation:
curl -X POST "localhost:9200/_cluster/reroute?retry_failed=true"Verification
After applying fixes, verify all shards are assigned:
```bash # Wait for cluster health curl -X GET "localhost:9200/_cluster/health?wait_for_status=green&timeout=120s&pretty"
# Check for remaining unassigned shards curl -X GET "localhost:9200/_cat/shards?v&s=state" | grep UNASSIGNED
# View shard distribution curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,node&s=index,shard,prirep" ```
Monitoring for Unassigned Shards
Set up proactive monitoring:
# Create a watch for unassigned shards
curl -X PUT "localhost:9200/_watcher/watch/unassigned_shards_alert" -H 'Content-Type: application/json' -d'
{
"trigger": {
"schedule": {
"interval": "5m"
}
},
"input": {
"http": {
"request": {
"host": "localhost",
"port": 9200,
"path": "/_cluster/health"
}
}
},
"condition": {
"compare": {
"ctx.payload.unassigned_shards": {
"gt": 0
}
}
},
"actions": {
"send_email": {
"email": {
"to": "ops@example.com",
"subject": "Elasticsearch: {{ctx.payload.unassigned_shards}} unassigned shards",
"body": "Cluster has {{ctx.payload.unassigned_shards}} unassigned shards. Status: {{ctx.payload.status}}"
}
}
}
}
'Summary Checklist
When troubleshooting unassigned shards:
- 1.Check cluster health for unassigned count
- 2.List unassigned shards with reasons
- 3.Run allocation explain for root cause
- 4.Address specific issue (disk, nodes, settings)
- 5.Retry failed allocations
- 6.Manually allocate if necessary
- 7.Verify cluster returns to green
- 8.Set up monitoring for early detection
Following this systematic approach will help you resolve unassigned shard issues quickly and maintain cluster health.