Introduction

Elasticsearch cluster red when primary shards unassigned. This guide provides step-by-step diagnosis and resolution.

Symptoms

Typical error output:

bash
cluster_name: production
status: red
timed_out: false
number_of_nodes: 3
unassigned_shards: 5
initializing_shards: 0

Common Causes

  1. 1.Service not running or unreachable
  2. 2.Memory or disk resource exhausted
  3. 3.Cluster or replication configuration issue
  4. 4.Authentication or permission denied

Step-by-Step Fix

Step 1: Check Current State

bash
# Check service status
systemctl status elasticsearch mongod redis
# View logs
journalctl -u elasticsearch -n 50
# Check cluster status
curl localhost:9200/_cluster/health

Step 2: Identify Root Cause

bash
# Check service logs
journalctl -u <service> -n 50
# Verify configuration
cat /etc/<service>/<service>.conf
# Check resources
free -m && df -h

Step 3: Apply Primary Fix

```bash # Primary fix: Check and restart # Check service systemctl status elasticsearch

# Check cluster health curl -X GET "localhost:9200/_cluster/health?pretty"

# Restart if needed systemctl restart elasticsearch ```

Step 4: Apply Alternative Fix

```bash # Alternative: Check configuration # View logs tail -f /var/log/elasticsearch/cluster.log

# Check memory free -m && df -h

# Verify configuration cat /etc/elasticsearch/elasticsearch.yml ```

Step 5: Verify the Fix

bash
curl localhost:9200/_cluster/health?pretty
# status should be "green" or "yellow"
systemctl status elasticsearch

Common Pitfalls

  • Not monitoring memory usage
  • Using default JVM heap settings
  • Ignoring cluster health status
  • Not testing failover scenarios

Best Practices

  • Monitor cluster health continuously
  • Set appropriate memory limits
  • Regular backup and restore testing
  • Keep software versions updated
  • Cluster Status Error
  • Memory Exhausted
  • Replication Failed
  • Query Timeout