Fix Etcd Leader Election Failed

What's Actually Happening

Etcd cluster cannot elect a leader due to insufficient members, network partitions, or quorum loss. The cluster becomes unavailable, affecting Kubernetes and other systems dependent on etcd.

The Error You'll See

Leader election failure:

```bash $ etcdctl endpoint status

{"Endpoint":"http://10.0.0.1:2379","Status":{"leader":"0"}} # leader: 0 means no leader elected

$ etcdctl endpoint health

http://10.0.0.1:2379 is unhealthy: failed to commit proposal: etcdserver: leader changed ```

Quorum lost:

```bash $ etcdctl member list

xxx: name=etcd-1 peerURLs=http://10.0.0.1:2380 clientURLs=http://10.0.0.1:2379 yyy: name=etcd-2 peerURLs=http://10.0.0.2:2380 clientURLs=http://10.0.0.2:2379 zzz: name=etcd-3 peerURLs=http://10.0.0.3:2380 clientURLs=http://10.0.0.3:2379 isLearner=false

# But only 1 member responsive ```

etcd logs:

```bash $ journalctl -u etcd | grep -i leader

etcdserver: failed to campaign: election timeout raft: elected leader at term 0 raft: leader changed from 0 to 0 ```

Why This Happens

1.Insufficient quorum - Less than majority of members available
2.Network partition - Members cannot communicate
3.Disk I/O issues - Slow disk causing election timeout
4.Member failures - Multiple members down
5.Clock skew - Large time difference between members
6.Configuration mismatch - Inconsistent cluster configuration

Step 1: Check Cluster Status

```bash # Check endpoint status etcdctl endpoint status --cluster -w table

ENDPOINT	ID	VERSION	DB SIZE	IS LEADER
http://10.0.0.1:2379	xxx	3.5.0	1.2 GB	false
http://10.0.0.2:2379	yyy	3.5.0	1.1 GB	false
http://10.0.0.3:2379	zzz	3.5.0	1.1 GB	false

# Check member list etcdctl member list -w table

# Check cluster health etcdctl endpoint health --cluster

# Check if any member is leader etcdctl endpoint status --cluster -w json | jq '.[] | select(.Status.leader != 0)' ```

Step 2: Check Member Connectivity

```bash # Check network connectivity between members ping 10.0.0.2 ping 10.0.0.3

# Check peer port (default 2380) nc -zv 10.0.0.2 2380 nc -zv 10.0.0.3 2380

# Check client port (default 2379) nc -zv 10.0.0.2 2379 nc -zv 10.0.0.3 2379

# Check firewall rules iptables -L -n -v | grep 2379 iptables -L -n -v | grep 2380

# Allow etcd ports iptables -I INPUT -p tcp --dport 2379 -j ACCEPT iptables -I INPUT -p tcp --dport 2380 -j ACCEPT

# Check for packet loss mtr 10.0.0.2 mtr 10.0.0.3 ```

Step 3: Check Member Logs

```bash # Check etcd logs on each member journalctl -u etcd -n 100 --no-pager

# Look for: # - election timeout # - leader changed # - network partition # - slow fdatasync

# Check for specific errors journalctl -u etcd | grep -E "election|leader|raft"

# Check disk I/O latency iostat -x 1 10

# Check for disk issues journalctl -u etcd | grep -i "slow|disk|fsync"

# Check etcd data directory ls -la /var/lib/etcd/data/ du -sh /var/lib/etcd/data/ ```

Step 4: Restart Failed Members

```bash # Check which members are running ps aux | grep etcd

# On each member, check status systemctl status etcd

# Restart members one at a time # Start with first member: systemctl restart etcd

# Wait for it to become ready etcdctl endpoint health

# Restart next member ssh 10.0.0.2 "systemctl restart etcd"

# Check cluster status etcdctl endpoint status --cluster

# If members won't start, check logs journalctl -u etcd -f ```

Step 5: Fix Quorum Loss

```bash # If majority of members lost, need to restore quorum

# For 3-node cluster, need 2 members minimum # For 5-node cluster, need 3 members minimum

# Option 1: Restart with existing members # On remaining member, check data ls /var/lib/etcd/data/member/

# Force new cluster (CAUTION: loses data on other members!) etcd --force-new-cluster \ --data-dir=/var/lib/etcd/data \ --listen-client-urls=http://10.0.0.1:2379 \ --advertise-client-urls=http://10.0.0.1:2379

# Then add back other members: etcdctl member add etcd-2 --peer-urls=http://10.0.0.2:2380 etcdctl member add etcd-3 --peer-urls=http://10.0.0.3:2380

# Option 2: Restore from snapshot etcdctl snapshot restore /backup/etcd-snapshot.db \ --data-dir=/var/lib/etcd/data-restore \ --initial-cluster etcd-1=http://10.0.0.1:2380,etcd-2=http://10.0.0.2:2380,etcd-3=http://10.0.0.3:2380 \ --initial-cluster-token etcd-cluster-1 \ --initial-advertise-peer-urls http://10.0.0.1:2380 ```

Step 6: Add New Member to Restore Quorum

```bash # If a member is permanently failed, add new member

# Remove failed member etcdctl member remove <member-id>

# Verify removal etcdctl member list

# Add new member etcdctl member add etcd-new --peer-urls=http://10.0.0.4:2380

# Output: Member xxx added to cluster yyy

ETCD_NAME="etcd-new" ETCD_INITIAL_CLUSTER="etcd-1=http://10.0.0.1:2380,etcd-2=http://10.0.0.2:2380,etcd-3=http://10.0.0.3:2380,etcd-new=http://10.0.0.4:2380" ETCD_INITIAL_CLUSTER_STATE="existing"

# Configure new member with these values: # In /etc/etcd/etcd.conf: ETCD_NAME=etcd-new ETCD_INITIAL_CLUSTER="etcd-1=http://...,etcd-new=http://..." ETCD_INITIAL_CLUSTER_STATE=existing ETCD_INITIAL_ADVERTISE_PEER_URLS=http://10.0.0.4:2380 ETCD_ADVERTISE_CLIENT_URLS=http://10.0.0.4:2379

# Start new member systemctl start etcd ```

Step 7: Check Clock Synchronization

```bash # Check time on all members date ssh 10.0.0.2 date ssh 10.0.0.3 date

# Check NTP status timedatectl status systemctl status chronyd

# Or for ntpd: systemctl status ntpd

# Check time difference ntpdate -q pool.ntp.org

# If large skew, sync clocks: chronyc makestep # or ntpd -gq

# Restart etcd after time sync systemctl restart etcd ```

Step 8: Check Election Timeout Configuration

```bash # Check current election timeout etcdctl get / --prefix --keys-only 2>/dev/null || echo "No leader"

# In etcd configuration: # /etc/etcd/etcd.conf

# Default values: ETCD_HEARTBEAT_INTERVAL=100 # 100ms ETCD_ELECTION_TIMEOUT=1000 # 1000ms

# If network is slow, increase: ETCD_HEARTBEAT_INTERVAL=500 # 500ms ETCD_ELECTION_TIMEOUT=2500 # 2500ms

# Rule: election_timeout > 10 * heartbeat_interval

# Restart etcd after changes systemctl restart etcd

# Verify settings etcdctl endpoint status --cluster -w json | jq '.[].Status.raftIndex' ```

Step 9: Defragment and Compact

```bash # Check database size etcdctl endpoint status --cluster -w table

# If DB size large, compact: # Get current revision REVISION=$(etcdctl endpoint status --write-out="json" | jq -r '.[0].Status.header.revision')

# Compact to revision etcdctl compact $REVISION

# Defragment each endpoint etcdctl defrag --endpoints=http://10.0.0.1:2379 etcdctl defrag --endpoints=http://10.0.0.2:2379 etcdctl defrag --endpoints=http://10.0.0.3:2379

# Check size after etcdctl endpoint status --cluster -w table

# Set quota if needed etcdctl quota set 8589934592 # 8GB ```

Step 10: Monitor Cluster Health

```bash # Create monitoring script cat << 'EOF' > monitor_etcd.sh #!/bin/bash

echo "=== Cluster Status ===" etcdctl endpoint status --cluster -w table

echo "" echo "=== Member Health ===" etcdctl member list -w table

echo "" echo "=== Endpoint Health ===" etcdctl endpoint health --cluster

echo "" echo "=== Leader ===" etcdctl endpoint status --cluster -w json | jq '.[] | select(.Status.leader == .Status.header.member_id) | .Endpoint'

echo "" echo "=== DB Sizes ===" etcdctl endpoint status --cluster -w json | jq '.[] | "\(.Endpoint): \(.Status.dbSize / 1024 / 1024) MB"'

echo "" echo "=== Alarm Status ===" etcdctl alarm list EOF

chmod +x monitor_etcd.sh

# Set up Prometheus metrics # etcd exposes metrics at /metrics endpoint curl http://localhost:2379/metrics | grep etcd_server_has_leader

# Key metrics: # etcd_server_has_leader: 1 = has leader, 0 = no leader # etcd_server_leader_changes_seen_total: leader changes count # etcd_disk_wal_fsync_duration_seconds: disk latency

# Alert rule for no leader: - alert: EtcdNoLeader expr: etcd_server_has_leader == 0 for: 1m labels: severity: critical annotations: summary: "etcd has no leader" ```

Etcd Leader Election Checklist

Check	Command	Expected
Leader exists	endpoint status	IS LEADER: true
Member count	member list	>= quorum
Connectivity	nc -zv 2380	Connected
Clock sync	date	Same time
DB size	endpoint status	< quota
Health	endpoint health	healthy

Verify the Fix

```bash # After resolving leader election issue

# 1. Check leader elected etcdctl endpoint status --cluster // One member shows IS LEADER: true

# 2. Verify cluster health etcdctl endpoint health --cluster // All endpoints healthy

# 3. Test read/write etcdctl put test/key value etcdctl get test/key // Returns value

# 4. Check member count etcdctl member list // All members present

# 5. Monitor for stability watch -n 5 'etcdctl endpoint status --cluster -w table' // Leader stable

# 6. Check logs for errors journalctl -u etcd | grep -i error // No errors ```

[Fix Etcd Cluster Unhealthy Quorum Lost](/articles/fix-etcd-cluster-unhealthy-quorum-lost)
[Fix Etcd WAL Corrupted](/articles/fix-etcd-wal-corrupted)
[Fix Kubernetes API Server Not Starting](/articles/fix-kubernetes-api-server-not-starting)

What's Actually Happening

The Error You'll See

Why This Happens

Step 1: Check Cluster Status

Step 2: Check Member Connectivity

Step 3: Check Member Logs

Step 4: Restart Failed Members

Step 5: Fix Quorum Loss

Step 6: Add New Member to Restore Quorum

Step 7: Check Clock Synchronization

Step 8: Check Election Timeout Configuration

Step 9: Defragment and Compact

Step 10: Monitor Cluster Health

Etcd Leader Election Checklist

Verify the Fix

Related Issues

Share this guide

More Database Troubleshooting Guides

Database Query Timeout

SQL Server TempDB Contention

Database Connection Pool Exhausted

SQL Server AlwaysOn Failover Failed

Database Sharding Rebalance Failed

SQL Server Log Reader Agent Stalled