# How to Fix Elasticsearch Unassigned Shards

You've discovered unassigned shards in your Elasticsearch cluster. Left unresolved, these can lead to reduced redundancy or even data loss. Let's walk through diagnosing and fixing this common issue.

Detecting Unassigned Shards

Start by checking your cluster health:

bash
curl -X GET "localhost:9200/_cluster/health?pretty"

A response with unassigned_shards > 0 indicates a problem:

json
{
  "cluster_name" : "production-cluster",
  "status" : "yellow",
  "unassigned_shards" : 5,
  "delayed_unassigned_shards" : 0
}

List all unassigned shards with details:

bash
curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason,node&s=state"

You'll see output like:

bash
index            shard prirep state      unassigned.reason  node
logs-2024-01     0     r      UNASSIGNED ALLOCATION_FAILED 
products         2     r      UNASSIGNED NODE_LEFT         
analytics        1     r      UNASSIGNED CLUSTER_RECOVERED

Understanding Unassigned Reasons

The unassigned.reason field tells you why the shard became unassigned:

ReasonDescription
INDEX_CREATEDNew index created, shards not yet assigned
CLUSTER_RECOVEREDFull cluster recovery after restart
INDEX_REOPENEDClosed index was reopened
DANGLING_INDEX_IMPORTEDIndex imported from another cluster
NEW_INDEX_RESTOREDRestored from snapshot
EXISTING_INDEX_RESTOREDRestored over existing index
REPLICA_ADDEDReplica count increased
ALLOCATION_FAILEDAllocation attempts failed
NODE_LEFTNode containing shard left cluster
REROUTE_CANCELLEDExplicit reroute cancellation
REINITIALIZEDPrimary shard reinitialized
REALLOCATED_REPLICAReplica relocated for rebalancing

Deep Diagnosis with Allocation Explain

For detailed diagnosis, use the allocation explain API:

bash
curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"

For a specific shard:

bash
curl -X GET "localhost:9200/_cluster/allocation/explain?pretty" -H 'Content-Type: application/json' -d'
{
  "index": "logs-2024-01",
  "shard": 0,
  "primary": false
}
'

The response provides comprehensive information:

json
{
  "index" : "logs-2024-01",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2024-01-15T10:30:00.000Z",
    "failed_attempts" : 5,
    "delayed" : false,
    "details" : "failed to create shard, failure IOException[disk space insufficient]",
    "last_allocation_status" : "deciders_no"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "node-1",
      "node_name" : "node-1",
      "transport_address" : "10.0.0.1:9300",
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node has insufficient disk space"
        }
      ]
    }
  ]
}

Solution 1: Fix Disk Space Issues

If disk space is the culprit, check allocation across nodes:

bash
curl -X GET "localhost:9200/_cat/allocation?v&h=shards,disk.indices,disk.used,disk.avail,disk.total,disk.percent"
bash
shards disk.indices disk.used disk.avail disk.total disk.percent
    15       25.2gb    85.1gb     14.9gb    100.0gb           85
    12       20.5gb    92.3gb      7.7gb    100.0gb           92

Nodes above 85% trigger the low watermark, preventing new shard allocation. Above 90% triggers the high watermark, attempting to relocate shards.

Free up space by deleting old indices:

bash
curl -X DELETE "localhost:9200/logs-2023-*"

Or adjust watermark thresholds temporarily:

bash
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.high": "95%"
  }
}
'

Solution 2: Re-enable Allocation

If shard allocation was disabled:

bash
curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&flat_settings=true&pretty" | grep allocation.enabled

Re-enable it:

bash
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "cluster.routing.allocation.enable": "all"
  }
}
'

The enable options are: - all: Allow all shard allocation (default) - primaries: Allow primary shard allocation only - new_primaries: Allow new primary shard allocation only - none: No shard allocation allowed

Solution 3: Handle NODE_LEFT Scenarios

When a node leaves, shards may remain unassigned if no replacement exists. First, check if the node will return:

bash
curl -X GET "localhost:9200/_cat/nodes?v"

If the node is permanently removed, you need to handle orphaned replicas:

For replica shards, simply let Elasticsearch reallocate:

bash
curl -X POST "localhost:9200/_cluster/reroute?retry_failed=true"

For primary shards from a departed node, you may need to allocate a stale primary:

bash
curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
  "commands": [
    {
      "allocate_stale_primary": {
        "index": "critical-data",
        "shard": 0,
        "node": "node-2",
        "accept_data_loss": true
      }
    }
  ]
}
'

Solution 4: Reset Failed Allocation Attempts

When allocation fails repeatedly, Elasticsearch stops trying. Reset and retry:

bash
curl -X POST "localhost:9200/_cluster/reroute?retry_failed=true&pretty"

For specific shards:

bash
curl -X GET "localhost:9200/_cluster/allocation/explain?include_yes_decisions=true" -H 'Content-Type: application/json' -d'
{
  "index": "logs-2024-01",
  "shard": 0,
  "primary": false
}
'

Solution 5: Manual Shard Allocation

When automatic allocation fails, manually assign shards:

For replica shards:

bash
curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
  "commands": [
    {
      "allocate_replica": {
        "index": "logs-2024-01",
        "shard": 0,
        "node": "node-2"
      }
    }
  ]
}
'

For primary shards (last resort):

bash
curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
  "commands": [
    {
      "allocate_empty_primary": {
        "index": "test-index",
        "shard": 0,
        "node": "node-1",
        "accept_data_loss": true
      }
    }
  ]
}
'

Use allocate_empty_primary only when you accept complete data loss for that shard.

Solution 6: Fix Attribute-Based Allocation

If you use attribute-based allocation, verify nodes have correct attributes:

bash
curl -X GET "localhost:9200/_cat/nodeattrs?v"

Check allocation filtering settings:

bash
curl -X GET "localhost:9200/index-name/_settings?pretty" | grep allocation

Adjust if necessary:

bash
curl -X PUT "localhost:9200/index-name/_settings" -H 'Content-Type: application/json' -d'
{
  "index.routing.allocation.include._tier_preference": "data_hot"
}
'

Solution 7: Handle Frozen Indices

Frozen indices may have allocation issues:

bash
curl -X POST "localhost:9200/index-name/_unfreeze"

Then trigger reallocation:

bash
curl -X POST "localhost:9200/_cluster/reroute?retry_failed=true"

Verification

After applying fixes, verify all shards are assigned:

```bash # Wait for cluster health curl -X GET "localhost:9200/_cluster/health?wait_for_status=green&timeout=120s&pretty"

# Check for remaining unassigned shards curl -X GET "localhost:9200/_cat/shards?v&s=state" | grep UNASSIGNED

# View shard distribution curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,node&s=index,shard,prirep" ```

Monitoring for Unassigned Shards

Set up proactive monitoring:

bash
# Create a watch for unassigned shards
curl -X PUT "localhost:9200/_watcher/watch/unassigned_shards_alert" -H 'Content-Type: application/json' -d'
{
  "trigger": {
    "schedule": {
      "interval": "5m"
    }
  },
  "input": {
    "http": {
      "request": {
        "host": "localhost",
        "port": 9200,
        "path": "/_cluster/health"
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.unassigned_shards": {
        "gt": 0
      }
    }
  },
  "actions": {
    "send_email": {
      "email": {
        "to": "ops@example.com",
        "subject": "Elasticsearch: {{ctx.payload.unassigned_shards}} unassigned shards",
        "body": "Cluster has {{ctx.payload.unassigned_shards}} unassigned shards. Status: {{ctx.payload.status}}"
      }
    }
  }
}
'

Summary Checklist

When troubleshooting unassigned shards:

  1. 1.Check cluster health for unassigned count
  2. 2.List unassigned shards with reasons
  3. 3.Run allocation explain for root cause
  4. 4.Address specific issue (disk, nodes, settings)
  5. 5.Retry failed allocations
  6. 6.Manually allocate if necessary
  7. 7.Verify cluster returns to green
  8. 8.Set up monitoring for early detection

Following this systematic approach will help you resolve unassigned shard issues quickly and maintain cluster health.