Fix Elasticsearch Unassigned Shards - Complete Recovery Guide

# How to Fix Elasticsearch Unassigned Shards

You've discovered unassigned shards in your Elasticsearch cluster. Left unresolved, these can lead to reduced redundancy or even data loss. Let's walk through diagnosing and fixing this common issue.

Detecting Unassigned Shards

Start by checking your cluster health:

bash

curl -X GET "localhost:9200/_cluster/health?pretty"

A response with unassigned_shards > 0 indicates a problem:

json

{
  "cluster_name" : "production-cluster",
  "status" : "yellow",
  "unassigned_shards" : 5,
  "delayed_unassigned_shards" : 0
}

List all unassigned shards with details:

bash

curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason,node&s=state"

You'll see output like:

bash

index            shard prirep state      unassigned.reason  node
logs-2024-01     0     r      UNASSIGNED ALLOCATION_FAILED 
products         2     r      UNASSIGNED NODE_LEFT         
analytics        1     r      UNASSIGNED CLUSTER_RECOVERED

Understanding Unassigned Reasons

The unassigned.reason field tells you why the shard became unassigned:

Reason	Description
INDEX_CREATED	New index created, shards not yet assigned
CLUSTER_RECOVERED	Full cluster recovery after restart
INDEX_REOPENED	Closed index was reopened
DANGLING_INDEX_IMPORTED	Index imported from another cluster
NEW_INDEX_RESTORED	Restored from snapshot
EXISTING_INDEX_RESTORED	Restored over existing index
REPLICA_ADDED	Replica count increased
ALLOCATION_FAILED	Allocation attempts failed
NODE_LEFT	Node containing shard left cluster
REROUTE_CANCELLED	Explicit reroute cancellation
REINITIALIZED	Primary shard reinitialized
REALLOCATED_REPLICA	Replica relocated for rebalancing

Deep Diagnosis with Allocation Explain

For detailed diagnosis, use the allocation explain API:

bash

curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"

For a specific shard:

bash

curl -X GET "localhost:9200/_cluster/allocation/explain?pretty" -H 'Content-Type: application/json' -d'
{
  "index": "logs-2024-01",
  "shard": 0,
  "primary": false
}
'

The response provides comprehensive information:

json

{
  "index" : "logs-2024-01",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2024-01-15T10:30:00.000Z",
    "failed_attempts" : 5,
    "delayed" : false,
    "details" : "failed to create shard, failure IOException[disk space insufficient]",
    "last_allocation_status" : "deciders_no"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "node-1",
      "node_name" : "node-1",
      "transport_address" : "10.0.0.1:9300",
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node has insufficient disk space"
        }
      ]
    }
  ]
}

Solution 1: Fix Disk Space Issues

If disk space is the culprit, check allocation across nodes:

bash

curl -X GET "localhost:9200/_cat/allocation?v&h=shards,disk.indices,disk.used,disk.avail,disk.total,disk.percent"

bash

shards disk.indices disk.used disk.avail disk.total disk.percent
    15       25.2gb    85.1gb     14.9gb    100.0gb           85
    12       20.5gb    92.3gb      7.7gb    100.0gb           92

Nodes above 85% trigger the low watermark, preventing new shard allocation. Above 90% triggers the high watermark, attempting to relocate shards.

Free up space by deleting old indices:

bash

curl -X DELETE "localhost:9200/logs-2023-*"

Or adjust watermark thresholds temporarily:

bash

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.high": "95%"
  }
}
'

Solution 2: Re-enable Allocation

If shard allocation was disabled:

bash

curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&flat_settings=true&pretty" | grep allocation.enabled

Re-enable it:

bash

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "cluster.routing.allocation.enable": "all"
  }
}
'

The enable options are: - all: Allow all shard allocation (default) - primaries: Allow primary shard allocation only - new_primaries: Allow new primary shard allocation only - none: No shard allocation allowed

Solution 3: Handle NODE_LEFT Scenarios

When a node leaves, shards may remain unassigned if no replacement exists. First, check if the node will return:

bash

curl -X GET "localhost:9200/_cat/nodes?v"

If the node is permanently removed, you need to handle orphaned replicas:

For replica shards, simply let Elasticsearch reallocate:

bash

curl -X POST "localhost:9200/_cluster/reroute?retry_failed=true"

For primary shards from a departed node, you may need to allocate a stale primary:

bash

curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
  "commands": [
    {
      "allocate_stale_primary": {
        "index": "critical-data",
        "shard": 0,
        "node": "node-2",
        "accept_data_loss": true
      }
    }
  ]
}
'

Solution 4: Reset Failed Allocation Attempts

When allocation fails repeatedly, Elasticsearch stops trying. Reset and retry:

bash

curl -X POST "localhost:9200/_cluster/reroute?retry_failed=true&pretty"

For specific shards:

bash

curl -X GET "localhost:9200/_cluster/allocation/explain?include_yes_decisions=true" -H 'Content-Type: application/json' -d'
{
  "index": "logs-2024-01",
  "shard": 0,
  "primary": false
}
'

Solution 5: Manual Shard Allocation

When automatic allocation fails, manually assign shards:

For replica shards:

bash

curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
  "commands": [
    {
      "allocate_replica": {
        "index": "logs-2024-01",
        "shard": 0,
        "node": "node-2"
      }
    }
  ]
}
'

For primary shards (last resort):

bash

curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
  "commands": [
    {
      "allocate_empty_primary": {
        "index": "test-index",
        "shard": 0,
        "node": "node-1",
        "accept_data_loss": true
      }
    }
  ]
}
'

Use allocate_empty_primary only when you accept complete data loss for that shard.

Solution 6: Fix Attribute-Based Allocation

If you use attribute-based allocation, verify nodes have correct attributes:

bash

curl -X GET "localhost:9200/_cat/nodeattrs?v"

Check allocation filtering settings:

bash

curl -X GET "localhost:9200/index-name/_settings?pretty" | grep allocation

Adjust if necessary:

bash

curl -X PUT "localhost:9200/index-name/_settings" -H 'Content-Type: application/json' -d'
{
  "index.routing.allocation.include._tier_preference": "data_hot"
}
'

Solution 7: Handle Frozen Indices

Frozen indices may have allocation issues:

bash

curl -X POST "localhost:9200/index-name/_unfreeze"

Then trigger reallocation:

bash

curl -X POST "localhost:9200/_cluster/reroute?retry_failed=true"

Verification

After applying fixes, verify all shards are assigned:

```bash # Wait for cluster health curl -X GET "localhost:9200/_cluster/health?wait_for_status=green&timeout=120s&pretty"

# Check for remaining unassigned shards curl -X GET "localhost:9200/_cat/shards?v&s=state" | grep UNASSIGNED

# View shard distribution curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,node&s=index,shard,prirep" ```

Monitoring for Unassigned Shards

Set up proactive monitoring:

bash

# Create a watch for unassigned shards
curl -X PUT "localhost:9200/_watcher/watch/unassigned_shards_alert" -H 'Content-Type: application/json' -d'
{
  "trigger": {
    "schedule": {
      "interval": "5m"
    }
  },
  "input": {
    "http": {
      "request": {
        "host": "localhost",
        "port": 9200,
        "path": "/_cluster/health"
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.unassigned_shards": {
        "gt": 0
      }
    }
  },
  "actions": {
    "send_email": {
      "email": {
        "to": "ops@example.com",
        "subject": "Elasticsearch: {{ctx.payload.unassigned_shards}} unassigned shards",
        "body": "Cluster has {{ctx.payload.unassigned_shards}} unassigned shards. Status: {{ctx.payload.status}}"
      }
    }
  }
}
'

Summary Checklist

When troubleshooting unassigned shards:

1.Check cluster health for unassigned count
2.List unassigned shards with reasons
3.Run allocation explain for root cause
4.Address specific issue (disk, nodes, settings)
5.Retry failed allocations
6.Manually allocate if necessary
7.Verify cluster returns to green
8.Set up monitoring for early detection

Following this systematic approach will help you resolve unassigned shard issues quickly and maintain cluster health.

How to Fix Elasticsearch Unassigned Shards

Detecting Unassigned Shards

Understanding Unassigned Reasons

Deep Diagnosis with Allocation Explain

Solution 1: Fix Disk Space Issues

Solution 2: Re-enable Allocation

Solution 3: Handle NODE_LEFT Scenarios

Solution 4: Reset Failed Allocation Attempts

Solution 5: Manual Shard Allocation

Solution 6: Fix Attribute-Based Allocation

Solution 7: Handle Frozen Indices

Verification

Monitoring for Unassigned Shards

Summary Checklist

Share this guide

More Monitoring Troubleshooting Guides

Metric Retention Expired

Timeseries Storage Full

Collector Agent Crashed

Webhook Notification Timeout

SMS Notification Failed

Email Notification Bounced