Fix Patroni Cluster No Leader

What's Actually Happening

Patroni-managed PostgreSQL cluster cannot elect a leader node. All nodes remain in replica or uninitialized state, making the cluster unavailable for writes.

The Error You'll See

No leader in cluster:

```bash $ patronictl list

Member	Host	Role	State
node-1	10.0.0.1:5432	Replica	running
node-2	10.0.0.2:5432	Replica	running
node-3	10.0.0.3:5432	Replica	running

# Should show one node as Leader ```

Patroni logs:

```bash $ journalctl -u patroni | grep -i leader

INFO: no leader node found in DCS INFO: starting leader election race WARNING: failed to acquire leader lock ```

DCS unavailable:

bash

ERROR: Failed to update DCS: connection refused

Why This Happens

1.DCS unavailable - Etcd/Consul/Kubernetes API unreachable
2.Network partition - Nodes cannot communicate
3.All nodes down - No healthy nodes to elect
4.Quorum loss - Insufficient nodes for consensus
5.Configuration mismatch - Inconsistent cluster config
6.DCS data corruption - Lock key corrupted or missing

Step 1: Check Cluster Status

```bash # List cluster members: patronictl list

# Detailed status: patronictl list -d

# Check specific node: patronictl query postgres-cluster --member node-1

# Check DCS (Distributed Configuration Store): # For etcd: etcdctl get /service/postgres-cluster/leader

# For Consul: consul kv get service/postgres-cluster/leader

# For Kubernetes: kubectl get configmap postgres-cluster-config -o yaml

# Check Patroni API on each node: curl http://10.0.0.1:8008/patroni curl http://10.0.0.2:8008/patroni curl http://10.0.0.3:8008/patroni ```

Step 2: Check DCS Connectivity

```bash # Check DCS backend:

# For etcd: etcdctl endpoint health etcdctl endpoint status

# Check Patroni DCS configuration: cat /etc/patroni/patroni.yml | grep -A 20 dcs

# Example etcd config: dcs: etcd: host: 10.0.0.10:2379

# Test etcd connectivity: etcdctl get /service/postgres-cluster --prefix

# For Consul: consul members consul kv get -recurse service/postgres-cluster

# For Kubernetes: kubectl get endpoints kubectl describe configmap postgres-cluster-config

# If DCS unavailable, fix DCS first: # For etcd: restart etcd cluster # For Consul: check Consul leader # For Kubernetes: check API server ```

Step 3: Check Node Health

```bash # Check PostgreSQL on each node: ssh node-1 "systemctl status postgresql" ssh node-2 "systemctl status postgresql" ssh node-3 "systemctl status postgresql"

# Check Patroni process: ssh node-1 "systemctl status patroni"

# Check PostgreSQL connectivity: psql -h node-1 -U postgres -c "SELECT pg_is_in_recovery();" psql -h node-2 -U postgres -c "SELECT pg_is_in_recovery();"

# Should show: # - One node returns false (primary) # - Others return true (replicas)

# If all return true, no primary exists

# Check Patroni logs on each node: ssh node-1 "journalctl -u patroni -n 50" ```

Step 4: Force Leader Election

```bash # If cluster healthy but no leader, force election:

# Method 1: Use patronictl to promote: patronictl switchover postgres-cluster --master node-1 --force

# Method 2: Remove leader key and let election happen: # For etcd: etcdctl del /service/postgres-cluster/leader

# For Consul: consul kv delete service/postgres-cluster/leader

# For Kubernetes: kubectl patch configmap postgres-cluster-config --type=json -p='[{"op": "remove", "path": "/data/leader"}]'

# Wait 10-30 seconds for election: patronictl list

# Should now show leader

# Method 3: Initialize specific node: patronictl initialize postgres-cluster --init-from node-1 ```

Step 5: Check Node Connectivity

```bash # Check network between nodes: ssh node-1 "ping node-2" ssh node-1 "ping node-3"

# Check PostgreSQL port: ssh node-1 "nc -zv node-2 5432" ssh node-1 "nc -zv node-3 5432"

# Check Patroni API port: ssh node-1 "nc -zv node-2 8008"

# Check firewall: ssh node-1 "iptables -L -n | grep 5432"

# Allow PostgreSQL ports: iptables -I INPUT -p tcp --dport 5432 -j ACCEPT iptables -I INPUT -p tcp --dport 8008 -j ACCEPT

# Check for network partition: # Nodes in different partitions cannot elect leader # All nodes must be able to communicate ```

Step 6: Check Raft Consensus

```bash # If using DCS raft mode (Patroni 2.0+):

# Check Raft configuration: cat /etc/patroni/patroni.yml | grep -A 10 raft

# Example: raft: self_addr: 10.0.0.1:2222 partner_addrs: ['10.0.0.2:2222', '10.0.0.3:2222']

# Check Raft port connectivity: nc -zv 10.0.0.2 2222 nc -zv 10.0.0.3 2222

# Check Raft leader: # Raft leader handles DCS operations

# If Raft fails, nodes cannot coordinate # Restart Patroni on all nodes: systemctl restart patroni

# Check logs for Raft errors: journalctl -u patroni | grep -i raft ```

Step 7: Recover Failed Node

```bash # If node PostgreSQL data corrupted:

# Check PostgreSQL data: ssh node-1 "ls -la /var/lib/postgresql/data/"

# If data missing or corrupted: # Remove node from cluster temporarily: patronictl delete postgres-cluster node-1

# Reinitialize node: patronictl reinit postgres-cluster node-1

# Or manually: ssh node-1 "rm -rf /var/lib/postgresql/data/*" ssh node-1 "pg_basebackup -h node-2 -U postgres -D /var/lib/postgresql/data" ssh node-1 "systemctl restart patroni"

# Check node rejoined: patronictl list

# If all nodes corrupted, bootstrap new cluster: patronictl bootstrap postgres-cluster --force ```

Step 8: Check Configuration

```bash # Verify Patroni configuration on all nodes: cat /etc/patroni/patroni.yml

# Key settings must be consistent: # - cluster name # - DCS configuration # - PostgreSQL parameters

# Scope (cluster name): scope: postgres-cluster # Must be same on all nodes

# PostgreSQL configuration: postgresql: parameters: max_connections: 200 wal_level: replica max_wal_senders: 10 # Must allow replication

# Check for config drift: diff /etc/patroni/patroni.yml.node1 /etc/patroni/patroni.yml.node2

# Fix inconsistencies: # Copy correct config to all nodes scp /etc/patroni/patroni.yml node-2:/etc/patroni/patroni.yml systemctl restart patroni ```

Step 9: DCS Data Recovery

```bash # If DCS data corrupted:

# Check DCS keys: # For etcd: etcdctl get /service/postgres-cluster --prefix --keys-only

# Expected keys: # /service/postgres-cluster/leader # /service/postgres-cluster/members/node-1 # /service/postgres-cluster/members/node-2 # /service/postgres-cluster/optime/leader # /service/postgres-cluster/config

# If leader key missing: # Add placeholder: etcdctl put /service/postgres-cluster/leader '{"leader": "node-1"}'

# If members keys missing: # Patroni will recreate on restart systemctl restart patroni

# If config corrupted: etcdctl del /service/postgres-cluster/config # Patroni will recreate from local config

# For Consul: consul kv get -recurse service/postgres-cluster consul kv put service/postgres-cluster/leader '{"leader": "node-1"}' ```

Step 10: Monitor Cluster Health

```bash # Create monitoring script: cat << 'EOF' > /usr/local/bin/monitor-patroni.sh #!/bin/bash

echo "=== Patroni Cluster Status ===" patronictl list

echo "" echo "=== DCS Status ===" etcdctl endpoint health

echo "" echo "=== Leader Check ===" LEADER=$(patronictl list | grep Leader | awk '{print $1}') if [ -z "$LEADER" ]; then echo "ERROR: No leader in cluster!" # Send alert else echo "OK: Leader is $LEADER" fi

echo "" echo "=== Node Health ===" for node in node-1 node-2 node-3; do curl -s http://$node:8008/patroni | jq '.state' done EOF

chmod +x /usr/local/bin/monitor-patroni.sh

# Patroni exposes Prometheus metrics: curl http://localhost:8008/metrics

# Key metrics: # patroni_dcs_last_seen - last DCS update # patroni_postgresql_running - PostgreSQL state # patroni_cluster_size - number of members # patroni_is_leader - is this node leader

# Alert rules: - alert: PatroniNoLeader expr: patroni_cluster_size > 0 and sum(patroni_is_leader) == 0 for: 1m labels: severity: critical annotations: summary: "Patroni cluster has no leader" ```

Patroni Cluster No Leader Checklist

Check	Command	Expected
Cluster list	patronictl list	Has Leader
DCS health	etcdctl health	Healthy
Node PostgreSQL	systemctl	Running
DCS leader key	etcdctl get leader	Exists
Network ports	nc -zv 5432	Connected
Node connectivity	ping	Reachable
Patroni config	patroni.yml	Consistent

Verify the Fix

```bash # After resolving leader election

# 1. Check cluster has leader patronictl list // One node shows Leader role

# 2. Verify primary can write psql -h <leader> -U postgres -c "CREATE TABLE test (id int);" // Table created

# 3. Check replicas syncing psql -h node-2 -U postgres -c "SELECT pg_last_wal_receive_lsn();" // LSN advancing

# 4. Test failover works patronictl switchover postgres-cluster --master node-2 // New leader elected

# 5. Monitor DCS etcdctl get /service/postgres-cluster/leader // Leader key updates

# 6. Check all nodes healthy patronictl list -d // All nodes running ```

[Fix Etcd Leader Election Failed](/articles/fix-etcd-leader-election-failed)
[Fix PostgreSQL WAL Archive Stuck](/articles/fix-postgresql-wal-archive-stuck)
[Fix PostgreSQL Connection Limit Exceeded](/articles/fix-postgresql-connection-limit-exceeded)

What's Actually Happening

The Error You'll See

Why This Happens

Step 1: Check Cluster Status

Step 2: Check DCS Connectivity

Step 3: Check Node Health

Step 4: Force Leader Election

Step 5: Check Node Connectivity

Step 6: Check Raft Consensus

Step 7: Recover Failed Node

Step 8: Check Configuration

Step 9: DCS Data Recovery

Step 10: Monitor Cluster Health

Patroni Cluster No Leader Checklist

Verify the Fix

Related Issues

Share this guide

More Database Troubleshooting Guides

Database Query Timeout

SQL Server TempDB Contention

Database Connection Pool Exhausted

SQL Server AlwaysOn Failover Failed

Database Sharding Rebalance Failed

SQL Server Log Reader Agent Stalled