Introduction

Elasticsearch nodes cannot form cluster when discovery or network configuration incorrect. This guide provides step-by-step diagnosis and resolution.

Symptoms

Typical error output:

bash
ERROR: Cluster formation failed
Node "es-node-1" cannot join cluster: discovery settings invalid
bootstrap checks failed: seed hosts not resolved

Common Causes

  1. 1.discovery.seed_hosts not correctly configured
  2. 2.cluster.initial_master_nodes missing or incorrect
  3. 3.Network binding or firewall blocking node communication
  4. 4.Node with incompatible version or configuration

Step-by-Step Fix

Step 1: Check Current State

bash
curl -XGET 'localhost:9200/_cluster/health?pretty'
curl -XGET 'localhost:9200/_nodes/_all/stats?pretty'
curl -XGET 'localhost:9200/_cat/nodes?v'

Step 2: Identify Root Cause

bash
curl -XGET 'localhost:9200/_cluster/health?pretty'
curl -XGET 'localhost:9200/_nodes/stats?pretty'

Step 3: Apply Primary Fix

```bash # Update discovery settings PUT _cluster/settings { "persistent": { "discovery.seed_hosts": ["es-node-1:9300", "es-node-2:9300"], "cluster.initial_master_nodes": ["es-node-1", "es-node-2"] } }

# Verify cluster formation GET _cluster/health?wait_for_status=yellow&timeout=30s ```

Step 4: Apply Alternative Fix

```bash # Alternative fix: Check node stats GET _nodes/stats?pretty

# Update specific index settings PUT my-index/_settings { "index": { "refresh_interval": "30s" } }

# Verify the fix GET _cat/indices?v&index=my-index ```

Step 5: Verify the Fix

bash
curl -XGET 'localhost:9200/_cluster/health?pretty'
curl -XGET 'localhost:9200/_cat/nodes?v'
# Status should be green or yellow

Common Pitfalls

  • Not waiting for cluster state propagation after settings change
  • Using text field for aggregations instead of keyword
  • Setting circuit breaker limits too low for production workload
  • Ignoring disk watermark warnings until cluster blocks

Best Practices

  • Monitor cluster health regularly with _cluster/health API
  • Use keyword fields for aggregations to avoid fielddata
  • Set circuit breaker limits based on heap size
  • Configure ILM policies for automated index management
  • Elasticsearch Cluster Red Status
  • Elasticsearch Index Not Found
  • Elasticsearch Query Timeout
  • Elasticsearch Node High CPU