What's Actually Happening
Etcd cluster cannot elect a leader due to insufficient members, network partitions, or quorum loss. The cluster becomes unavailable, affecting Kubernetes and other systems dependent on etcd.
The Error You'll See
Leader election failure:
```bash $ etcdctl endpoint status
{"Endpoint":"http://10.0.0.1:2379","Status":{"leader":"0"}} # leader: 0 means no leader elected
$ etcdctl endpoint health
http://10.0.0.1:2379 is unhealthy: failed to commit proposal: etcdserver: leader changed ```
Quorum lost:
```bash $ etcdctl member list
xxx: name=etcd-1 peerURLs=http://10.0.0.1:2380 clientURLs=http://10.0.0.1:2379 yyy: name=etcd-2 peerURLs=http://10.0.0.2:2380 clientURLs=http://10.0.0.2:2379 zzz: name=etcd-3 peerURLs=http://10.0.0.3:2380 clientURLs=http://10.0.0.3:2379 isLearner=false
# But only 1 member responsive ```
etcd logs:
```bash $ journalctl -u etcd | grep -i leader
etcdserver: failed to campaign: election timeout raft: elected leader at term 0 raft: leader changed from 0 to 0 ```
Why This Happens
- 1.Insufficient quorum - Less than majority of members available
- 2.Network partition - Members cannot communicate
- 3.Disk I/O issues - Slow disk causing election timeout
- 4.Member failures - Multiple members down
- 5.Clock skew - Large time difference between members
- 6.Configuration mismatch - Inconsistent cluster configuration
Step 1: Check Cluster Status
```bash # Check endpoint status etcdctl endpoint status --cluster -w table
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER |
|---|---|---|---|---|
| http://10.0.0.1:2379 | xxx | 3.5.0 | 1.2 GB | false |
| http://10.0.0.2:2379 | yyy | 3.5.0 | 1.1 GB | false |
| http://10.0.0.3:2379 | zzz | 3.5.0 | 1.1 GB | false |
# Check member list etcdctl member list -w table
# Check cluster health etcdctl endpoint health --cluster
# Check if any member is leader etcdctl endpoint status --cluster -w json | jq '.[] | select(.Status.leader != 0)' ```
Step 2: Check Member Connectivity
```bash # Check network connectivity between members ping 10.0.0.2 ping 10.0.0.3
# Check peer port (default 2380) nc -zv 10.0.0.2 2380 nc -zv 10.0.0.3 2380
# Check client port (default 2379) nc -zv 10.0.0.2 2379 nc -zv 10.0.0.3 2379
# Check firewall rules iptables -L -n -v | grep 2379 iptables -L -n -v | grep 2380
# Allow etcd ports iptables -I INPUT -p tcp --dport 2379 -j ACCEPT iptables -I INPUT -p tcp --dport 2380 -j ACCEPT
# Check for packet loss mtr 10.0.0.2 mtr 10.0.0.3 ```
Step 3: Check Member Logs
```bash # Check etcd logs on each member journalctl -u etcd -n 100 --no-pager
# Look for: # - election timeout # - leader changed # - network partition # - slow fdatasync
# Check for specific errors journalctl -u etcd | grep -E "election|leader|raft"
# Check disk I/O latency iostat -x 1 10
# Check for disk issues journalctl -u etcd | grep -i "slow|disk|fsync"
# Check etcd data directory ls -la /var/lib/etcd/data/ du -sh /var/lib/etcd/data/ ```
Step 4: Restart Failed Members
```bash # Check which members are running ps aux | grep etcd
# On each member, check status systemctl status etcd
# Restart members one at a time # Start with first member: systemctl restart etcd
# Wait for it to become ready etcdctl endpoint health
# Restart next member ssh 10.0.0.2 "systemctl restart etcd"
# Check cluster status etcdctl endpoint status --cluster
# If members won't start, check logs journalctl -u etcd -f ```
Step 5: Fix Quorum Loss
```bash # If majority of members lost, need to restore quorum
# For 3-node cluster, need 2 members minimum # For 5-node cluster, need 3 members minimum
# Option 1: Restart with existing members # On remaining member, check data ls /var/lib/etcd/data/member/
# Force new cluster (CAUTION: loses data on other members!) etcd --force-new-cluster \ --data-dir=/var/lib/etcd/data \ --listen-client-urls=http://10.0.0.1:2379 \ --advertise-client-urls=http://10.0.0.1:2379
# Then add back other members: etcdctl member add etcd-2 --peer-urls=http://10.0.0.2:2380 etcdctl member add etcd-3 --peer-urls=http://10.0.0.3:2380
# Option 2: Restore from snapshot etcdctl snapshot restore /backup/etcd-snapshot.db \ --data-dir=/var/lib/etcd/data-restore \ --initial-cluster etcd-1=http://10.0.0.1:2380,etcd-2=http://10.0.0.2:2380,etcd-3=http://10.0.0.3:2380 \ --initial-cluster-token etcd-cluster-1 \ --initial-advertise-peer-urls http://10.0.0.1:2380 ```
Step 6: Add New Member to Restore Quorum
```bash # If a member is permanently failed, add new member
# Remove failed member etcdctl member remove <member-id>
# Verify removal etcdctl member list
# Add new member etcdctl member add etcd-new --peer-urls=http://10.0.0.4:2380
# Output: Member xxx added to cluster yyy
ETCD_NAME="etcd-new" ETCD_INITIAL_CLUSTER="etcd-1=http://10.0.0.1:2380,etcd-2=http://10.0.0.2:2380,etcd-3=http://10.0.0.3:2380,etcd-new=http://10.0.0.4:2380" ETCD_INITIAL_CLUSTER_STATE="existing"
# Configure new member with these values: # In /etc/etcd/etcd.conf: ETCD_NAME=etcd-new ETCD_INITIAL_CLUSTER="etcd-1=http://...,etcd-new=http://..." ETCD_INITIAL_CLUSTER_STATE=existing ETCD_INITIAL_ADVERTISE_PEER_URLS=http://10.0.0.4:2380 ETCD_ADVERTISE_CLIENT_URLS=http://10.0.0.4:2379
# Start new member systemctl start etcd ```
Step 7: Check Clock Synchronization
```bash # Check time on all members date ssh 10.0.0.2 date ssh 10.0.0.3 date
# Check NTP status timedatectl status systemctl status chronyd
# Or for ntpd: systemctl status ntpd
# Check time difference ntpdate -q pool.ntp.org
# If large skew, sync clocks: chronyc makestep # or ntpd -gq
# Restart etcd after time sync systemctl restart etcd ```
Step 8: Check Election Timeout Configuration
```bash # Check current election timeout etcdctl get / --prefix --keys-only 2>/dev/null || echo "No leader"
# In etcd configuration: # /etc/etcd/etcd.conf
# Default values: ETCD_HEARTBEAT_INTERVAL=100 # 100ms ETCD_ELECTION_TIMEOUT=1000 # 1000ms
# If network is slow, increase: ETCD_HEARTBEAT_INTERVAL=500 # 500ms ETCD_ELECTION_TIMEOUT=2500 # 2500ms
# Rule: election_timeout > 10 * heartbeat_interval
# Restart etcd after changes systemctl restart etcd
# Verify settings etcdctl endpoint status --cluster -w json | jq '.[].Status.raftIndex' ```
Step 9: Defragment and Compact
```bash # Check database size etcdctl endpoint status --cluster -w table
# If DB size large, compact: # Get current revision REVISION=$(etcdctl endpoint status --write-out="json" | jq -r '.[0].Status.header.revision')
# Compact to revision etcdctl compact $REVISION
# Defragment each endpoint etcdctl defrag --endpoints=http://10.0.0.1:2379 etcdctl defrag --endpoints=http://10.0.0.2:2379 etcdctl defrag --endpoints=http://10.0.0.3:2379
# Check size after etcdctl endpoint status --cluster -w table
# Set quota if needed etcdctl quota set 8589934592 # 8GB ```
Step 10: Monitor Cluster Health
```bash # Create monitoring script cat << 'EOF' > monitor_etcd.sh #!/bin/bash
echo "=== Cluster Status ===" etcdctl endpoint status --cluster -w table
echo "" echo "=== Member Health ===" etcdctl member list -w table
echo "" echo "=== Endpoint Health ===" etcdctl endpoint health --cluster
echo "" echo "=== Leader ===" etcdctl endpoint status --cluster -w json | jq '.[] | select(.Status.leader == .Status.header.member_id) | .Endpoint'
echo "" echo "=== DB Sizes ===" etcdctl endpoint status --cluster -w json | jq '.[] | "\(.Endpoint): \(.Status.dbSize / 1024 / 1024) MB"'
echo "" echo "=== Alarm Status ===" etcdctl alarm list EOF
chmod +x monitor_etcd.sh
# Set up Prometheus metrics # etcd exposes metrics at /metrics endpoint curl http://localhost:2379/metrics | grep etcd_server_has_leader
# Key metrics: # etcd_server_has_leader: 1 = has leader, 0 = no leader # etcd_server_leader_changes_seen_total: leader changes count # etcd_disk_wal_fsync_duration_seconds: disk latency
# Alert rule for no leader: - alert: EtcdNoLeader expr: etcd_server_has_leader == 0 for: 1m labels: severity: critical annotations: summary: "etcd has no leader" ```
Etcd Leader Election Checklist
| Check | Command | Expected |
|---|---|---|
| Leader exists | endpoint status | IS LEADER: true |
| Member count | member list | >= quorum |
| Connectivity | nc -zv 2380 | Connected |
| Clock sync | date | Same time |
| DB size | endpoint status | < quota |
| Health | endpoint health | healthy |
Verify the Fix
```bash # After resolving leader election issue
# 1. Check leader elected etcdctl endpoint status --cluster // One member shows IS LEADER: true
# 2. Verify cluster health etcdctl endpoint health --cluster // All endpoints healthy
# 3. Test read/write etcdctl put test/key value etcdctl get test/key // Returns value
# 4. Check member count etcdctl member list // All members present
# 5. Monitor for stability watch -n 5 'etcdctl endpoint status --cluster -w table' // Leader stable
# 6. Check logs for errors journalctl -u etcd | grep -i error // No errors ```
Related Issues
- [Fix Etcd Cluster Unhealthy Quorum Lost](/articles/fix-etcd-cluster-unhealthy-quorum-lost)
- [Fix Etcd WAL Corrupted](/articles/fix-etcd-wal-corrupted)
- [Fix Kubernetes API Server Not Starting](/articles/fix-kubernetes-api-server-not-starting)