What's Actually Happening

Etcd cluster cannot elect a leader due to insufficient members, network partitions, or quorum loss. The cluster becomes unavailable, affecting Kubernetes and other systems dependent on etcd.

The Error You'll See

Leader election failure:

```bash $ etcdctl endpoint status

{"Endpoint":"http://10.0.0.1:2379","Status":{"leader":"0"}} # leader: 0 means no leader elected

$ etcdctl endpoint health

http://10.0.0.1:2379 is unhealthy: failed to commit proposal: etcdserver: leader changed ```

Quorum lost:

```bash $ etcdctl member list

xxx: name=etcd-1 peerURLs=http://10.0.0.1:2380 clientURLs=http://10.0.0.1:2379 yyy: name=etcd-2 peerURLs=http://10.0.0.2:2380 clientURLs=http://10.0.0.2:2379 zzz: name=etcd-3 peerURLs=http://10.0.0.3:2380 clientURLs=http://10.0.0.3:2379 isLearner=false

# But only 1 member responsive ```

etcd logs:

```bash $ journalctl -u etcd | grep -i leader

etcdserver: failed to campaign: election timeout raft: elected leader at term 0 raft: leader changed from 0 to 0 ```

Why This Happens

  1. 1.Insufficient quorum - Less than majority of members available
  2. 2.Network partition - Members cannot communicate
  3. 3.Disk I/O issues - Slow disk causing election timeout
  4. 4.Member failures - Multiple members down
  5. 5.Clock skew - Large time difference between members
  6. 6.Configuration mismatch - Inconsistent cluster configuration

Step 1: Check Cluster Status

```bash # Check endpoint status etcdctl endpoint status --cluster -w table

ENDPOINTIDVERSIONDB SIZEIS LEADER
http://10.0.0.1:2379xxx3.5.01.2 GBfalse
http://10.0.0.2:2379yyy3.5.01.1 GBfalse
http://10.0.0.3:2379zzz3.5.01.1 GBfalse

# Check member list etcdctl member list -w table

# Check cluster health etcdctl endpoint health --cluster

# Check if any member is leader etcdctl endpoint status --cluster -w json | jq '.[] | select(.Status.leader != 0)' ```

Step 2: Check Member Connectivity

```bash # Check network connectivity between members ping 10.0.0.2 ping 10.0.0.3

# Check peer port (default 2380) nc -zv 10.0.0.2 2380 nc -zv 10.0.0.3 2380

# Check client port (default 2379) nc -zv 10.0.0.2 2379 nc -zv 10.0.0.3 2379

# Check firewall rules iptables -L -n -v | grep 2379 iptables -L -n -v | grep 2380

# Allow etcd ports iptables -I INPUT -p tcp --dport 2379 -j ACCEPT iptables -I INPUT -p tcp --dport 2380 -j ACCEPT

# Check for packet loss mtr 10.0.0.2 mtr 10.0.0.3 ```

Step 3: Check Member Logs

```bash # Check etcd logs on each member journalctl -u etcd -n 100 --no-pager

# Look for: # - election timeout # - leader changed # - network partition # - slow fdatasync

# Check for specific errors journalctl -u etcd | grep -E "election|leader|raft"

# Check disk I/O latency iostat -x 1 10

# Check for disk issues journalctl -u etcd | grep -i "slow|disk|fsync"

# Check etcd data directory ls -la /var/lib/etcd/data/ du -sh /var/lib/etcd/data/ ```

Step 4: Restart Failed Members

```bash # Check which members are running ps aux | grep etcd

# On each member, check status systemctl status etcd

# Restart members one at a time # Start with first member: systemctl restart etcd

# Wait for it to become ready etcdctl endpoint health

# Restart next member ssh 10.0.0.2 "systemctl restart etcd"

# Check cluster status etcdctl endpoint status --cluster

# If members won't start, check logs journalctl -u etcd -f ```

Step 5: Fix Quorum Loss

```bash # If majority of members lost, need to restore quorum

# For 3-node cluster, need 2 members minimum # For 5-node cluster, need 3 members minimum

# Option 1: Restart with existing members # On remaining member, check data ls /var/lib/etcd/data/member/

# Force new cluster (CAUTION: loses data on other members!) etcd --force-new-cluster \ --data-dir=/var/lib/etcd/data \ --listen-client-urls=http://10.0.0.1:2379 \ --advertise-client-urls=http://10.0.0.1:2379

# Then add back other members: etcdctl member add etcd-2 --peer-urls=http://10.0.0.2:2380 etcdctl member add etcd-3 --peer-urls=http://10.0.0.3:2380

# Option 2: Restore from snapshot etcdctl snapshot restore /backup/etcd-snapshot.db \ --data-dir=/var/lib/etcd/data-restore \ --initial-cluster etcd-1=http://10.0.0.1:2380,etcd-2=http://10.0.0.2:2380,etcd-3=http://10.0.0.3:2380 \ --initial-cluster-token etcd-cluster-1 \ --initial-advertise-peer-urls http://10.0.0.1:2380 ```

Step 6: Add New Member to Restore Quorum

```bash # If a member is permanently failed, add new member

# Remove failed member etcdctl member remove <member-id>

# Verify removal etcdctl member list

# Add new member etcdctl member add etcd-new --peer-urls=http://10.0.0.4:2380

# Output: Member xxx added to cluster yyy

ETCD_NAME="etcd-new" ETCD_INITIAL_CLUSTER="etcd-1=http://10.0.0.1:2380,etcd-2=http://10.0.0.2:2380,etcd-3=http://10.0.0.3:2380,etcd-new=http://10.0.0.4:2380" ETCD_INITIAL_CLUSTER_STATE="existing"

# Configure new member with these values: # In /etc/etcd/etcd.conf: ETCD_NAME=etcd-new ETCD_INITIAL_CLUSTER="etcd-1=http://...,etcd-new=http://..." ETCD_INITIAL_CLUSTER_STATE=existing ETCD_INITIAL_ADVERTISE_PEER_URLS=http://10.0.0.4:2380 ETCD_ADVERTISE_CLIENT_URLS=http://10.0.0.4:2379

# Start new member systemctl start etcd ```

Step 7: Check Clock Synchronization

```bash # Check time on all members date ssh 10.0.0.2 date ssh 10.0.0.3 date

# Check NTP status timedatectl status systemctl status chronyd

# Or for ntpd: systemctl status ntpd

# Check time difference ntpdate -q pool.ntp.org

# If large skew, sync clocks: chronyc makestep # or ntpd -gq

# Restart etcd after time sync systemctl restart etcd ```

Step 8: Check Election Timeout Configuration

```bash # Check current election timeout etcdctl get / --prefix --keys-only 2>/dev/null || echo "No leader"

# In etcd configuration: # /etc/etcd/etcd.conf

# Default values: ETCD_HEARTBEAT_INTERVAL=100 # 100ms ETCD_ELECTION_TIMEOUT=1000 # 1000ms

# If network is slow, increase: ETCD_HEARTBEAT_INTERVAL=500 # 500ms ETCD_ELECTION_TIMEOUT=2500 # 2500ms

# Rule: election_timeout > 10 * heartbeat_interval

# Restart etcd after changes systemctl restart etcd

# Verify settings etcdctl endpoint status --cluster -w json | jq '.[].Status.raftIndex' ```

Step 9: Defragment and Compact

```bash # Check database size etcdctl endpoint status --cluster -w table

# If DB size large, compact: # Get current revision REVISION=$(etcdctl endpoint status --write-out="json" | jq -r '.[0].Status.header.revision')

# Compact to revision etcdctl compact $REVISION

# Defragment each endpoint etcdctl defrag --endpoints=http://10.0.0.1:2379 etcdctl defrag --endpoints=http://10.0.0.2:2379 etcdctl defrag --endpoints=http://10.0.0.3:2379

# Check size after etcdctl endpoint status --cluster -w table

# Set quota if needed etcdctl quota set 8589934592 # 8GB ```

Step 10: Monitor Cluster Health

```bash # Create monitoring script cat << 'EOF' > monitor_etcd.sh #!/bin/bash

echo "=== Cluster Status ===" etcdctl endpoint status --cluster -w table

echo "" echo "=== Member Health ===" etcdctl member list -w table

echo "" echo "=== Endpoint Health ===" etcdctl endpoint health --cluster

echo "" echo "=== Leader ===" etcdctl endpoint status --cluster -w json | jq '.[] | select(.Status.leader == .Status.header.member_id) | .Endpoint'

echo "" echo "=== DB Sizes ===" etcdctl endpoint status --cluster -w json | jq '.[] | "\(.Endpoint): \(.Status.dbSize / 1024 / 1024) MB"'

echo "" echo "=== Alarm Status ===" etcdctl alarm list EOF

chmod +x monitor_etcd.sh

# Set up Prometheus metrics # etcd exposes metrics at /metrics endpoint curl http://localhost:2379/metrics | grep etcd_server_has_leader

# Key metrics: # etcd_server_has_leader: 1 = has leader, 0 = no leader # etcd_server_leader_changes_seen_total: leader changes count # etcd_disk_wal_fsync_duration_seconds: disk latency

# Alert rule for no leader: - alert: EtcdNoLeader expr: etcd_server_has_leader == 0 for: 1m labels: severity: critical annotations: summary: "etcd has no leader" ```

Etcd Leader Election Checklist

CheckCommandExpected
Leader existsendpoint statusIS LEADER: true
Member countmember list>= quorum
Connectivitync -zv 2380Connected
Clock syncdateSame time
DB sizeendpoint status< quota
Healthendpoint healthhealthy

Verify the Fix

```bash # After resolving leader election issue

# 1. Check leader elected etcdctl endpoint status --cluster // One member shows IS LEADER: true

# 2. Verify cluster health etcdctl endpoint health --cluster // All endpoints healthy

# 3. Test read/write etcdctl put test/key value etcdctl get test/key // Returns value

# 4. Check member count etcdctl member list // All members present

# 5. Monitor for stability watch -n 5 'etcdctl endpoint status --cluster -w table' // Leader stable

# 6. Check logs for errors journalctl -u etcd | grep -i error // No errors ```

  • [Fix Etcd Cluster Unhealthy Quorum Lost](/articles/fix-etcd-cluster-unhealthy-quorum-lost)
  • [Fix Etcd WAL Corrupted](/articles/fix-etcd-wal-corrupted)
  • [Fix Kubernetes API Server Not Starting](/articles/fix-kubernetes-api-server-not-starting)