Fix Kafka Partition Leader Election Timeout

Introduction

When a partition leader becomes unavailable, Kafka must elect a new leader from the in-sync replica set (ISR). If the election process takes too long -- due to controller overload, ISR shrinkage, or metadata propagation delays -- the partition remains leaderless, blocking all produce and consume operations for that partition.

Symptoms

Producers receive NOT_LEADER_OR_FOLLOWER or LEADER_NOT_AVAILABLE errors
Consumer fetch requests hang or fail with partition metadata errors
kafka-topics --describe shows partitions with Leader: -1
Controller logs show Leader election timed out or failed to complete
Partition unavailability duration exceeds the configured election timeout

Common Causes

Controller broker overloaded with metadata operations, delaying election processing
ISR empty or containing only the failed leader, requiring unclean election
Network latency between controller and broker nodes slowing metadata propagation
Large number of simultaneous partition failures overwhelming the controller
ZooKeeper latency affecting controller election and metadata updates

Step-by-Step Fix

1.Identify leaderless partitions: Find all partitions without a leader.
2.```bash
3.kafka-topics.sh --bootstrap-server localhost:9092 --describe --unavailable-partitions
4.`
5.Trigger manual leader election for stuck partitions: Force a preferred replica election.
6.```bash
7.kafka-leader-election.sh --bootstrap-server localhost:9092 \
8.--election-type preferred --topic my-topic --partition 0
9.`
10.Check controller broker health: Verify the controller is responsive.
11.```bash
12.# Find the active controller
13.zookeeper-shell.sh localhost:2181 <<< "get /controller" 2>/dev/null | grep brokerid
14.# Check controller broker logs
15.grep "Controller" /var/log/kafka/server.log | tail -30
16.`
17.If ISR is empty, perform unclean leader election as last resort: Accept potential data loss to restore availability.
18.```bash
19.kafka-leader-election.sh --bootstrap-server localhost:9092 \
20.--election-type unclean --topic my-topic --partition 0
21.`
22.Verify partition recovery: Confirm all partitions have elected leaders.
23.```bash
24.kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic my-topic | grep -v "Leader: -1"
25.`

Prevention

Ensure ISR has at least 2 members with min.insync.replicas=2 to maintain election candidates
Monitor controller broker CPU, memory, and request queue depth
Set leader.imbalance.check.interval.seconds to detect and correct leader skew proactively
Distribute partition leaders evenly across brokers using auto.leader.rebalance.enable=true
Keep Kafka controller on a dedicated, well-resourced broker for metadata-heavy clusters
Monitor election duration and alert when it exceeds 10 seconds

Kafka Topic Partition Leader Election Taking Too Long

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Prevention

Share this guide

More Kafka Troubleshooting Guides

Kafka Schema Registry Backward Compatibility Check Rejecting New Version

Kafka Consumer Offset Commit Failed Group Coordinator Not Available

Kafka SASL SCRAM Authentication Failed During Credentials Rotation

Kafka Compacted Topic Log Cleanup Removing Active Keys

Kafka Producer Idempotence Lost After Broker Crash

Kafka ISR Shrinking Due to Slow Follower Replication Lag