# How to Fix Elasticsearch Yellow Cluster Status
You've checked your Elasticsearch cluster health and noticed it's showing yellow instead of green. While your data is still accessible, this status indicates something isn't quite right with your shard allocation.
Understanding Yellow Status
A yellow cluster status means that all primary shards are assigned and functioning, but at least one replica shard is unassigned. This isn't a critical failure like red status, but it does mean you've lost your redundancy for some indices.
Here's what you'll typically see when running a health check:
curl -X GET "localhost:9200/_cluster/health?pretty"The response shows the yellow status:
{
"cluster_name" : "production-cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 15,
"active_shards" : 15,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 15,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0
}Common Causes
The yellow status typically occurs for these reasons:
- 1.Single-node cluster: Replica shards cannot be assigned because they're configured to exist but there's nowhere to put them
- 2.Node failures: Some nodes left the cluster, leaving replicas without homes
- 3.Disk space issues: Nodes don't have enough disk space for replica allocation
- 4.Allocation settings: Shard allocation has been disabled or restricted
Diagnosing the Issue
First, identify which indices have unassigned shards:
curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason&s=state"This will show you something like:
index shard prirep state unassigned.reason
logs-2024-01 0 r UNASSIGNED ALLOCATION_FAILED
logs-2024-01 1 r UNASSIGNED ALLOCATION_FAILED
products 0 r UNASSIGNED NODE_LEFTFor more detailed information about why shards aren't being assigned:
curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"This returns detailed diagnostics:
{
"index" : "logs-2024-01",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2024-01-15T10:30:00.000Z",
"failed_attempts" : 5,
"details" : "failed to create shard [...]",
"last_allocation_status" : "no"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes"
}Solution 1: Single Node Cluster
If you're running a single-node cluster for development or testing, the simplest solution is to reduce the replica count to zero:
curl -X PUT "localhost:9200/_all/_settings" -H 'Content-Type: application/json' -d'
{
"index": {
"number_of_replicas": 0
}
}
'To apply this to future indices, update your index templates:
curl -X PUT "localhost:9200/_template/default_replicas" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["*"],
"settings": {
"number_of_replicas": 0
}
}
'Solution 2: Fix Allocation Settings
Check if shard allocation has been disabled:
curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&flat_settings=true&pretty"Look for cluster.routing.allocation.enabled. If it's set to none, re-enable it:
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"transient": {
"cluster.routing.allocation.enabled": "all"
}
}
'Solution 3: Address Disk Space
Check disk usage across your nodes:
curl -X GET "localhost:9200/_cat/allocation?v"If nodes are above the flood stage watermark (95% by default), you'll need to free up space or add nodes. The disk watermarks are:
cluster.routing.allocation.disk.watermark.low: 85% - stops new shard allocationcluster.routing.allocation.disk.watermark.high: 90% - attempts to relocate shardscluster.routing.allocation.disk.watermark.flood_stage: 95% - blocks index writes
You can temporarily adjust these settings:
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "90%",
"cluster.routing.allocation.disk.watermark.high": "95%",
"cluster.routing.allocation.disk.watermark.flood_stage": "98%"
}
}
'However, the better approach is to add more disk capacity or delete old indices.
Solution 4: Reroute Stuck Shards
Sometimes shards get stuck in an unassigned state. You can manually reroute them:
curl -X POST "localhost:9200/_cluster/reroute?retry_failed=true" -H 'Content-Type: application/json' -d'
{
"commands": [
{
"allocate_stale_primary": {
"index": "logs-2024-01",
"shard": 0,
"node": "node-1",
"accept_data_loss": true
}
}
]
}
'For replica shards, use allocate_replica:
curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
"commands": [
{
"allocate_replica": {
"index": "logs-2024-01",
"shard": 0,
"node": "node-2"
}
}
]
}
'Verifying the Fix
After applying your solution, verify cluster health:
curl -X GET "localhost:9200/_cluster/health?wait_for_status=green&timeout=30s&pretty"Check the shard allocation status:
curl -X GET "localhost:9200/_cat/shards?v&s=state"You should see all shards showing STARTED status and no UNASSIGNED entries.
Prevention
To prevent yellow status from recurring:
- 1.Monitor node count: Ensure you have enough nodes to accommodate your replica configuration
- 2.Set up alerts: Configure alerts for yellow status changes
- 3.Plan capacity: Monitor disk usage and add capacity before hitting watermarks
- 4.Use ILM: Implement Index Lifecycle Management to handle index aging and cleanup
Set up a basic alert using Elasticsearch's watch feature or integrate with tools like Prometheus and Grafana for ongoing monitoring.