Fix Kafka ZooKeeper Session Expired Broker Deregistered

Introduction

Kafka brokers maintain a session with ZooKeeper using heartbeats. If the broker fails to send a heartbeat within the session timeout, ZooKeeper expires the session and removes the broker from the cluster. This triggers an immediate leader election for all partitions the broker was leading, causing temporary unavailability and consumer rebalancing.

Symptoms

Broker disappears from the cluster without a graceful shutdown
ZooKeeper logs show Expiring session 0x... for the broker
Consumer logs show Leader not available for previously healthy partitions
Broker logs prior to deregistration show Unable to reconnect to ZooKeeper
Sudden spike in consumer lag and producer timeout errors

Common Causes

Long GC pauses (stop-the-world) preventing ZooKeeper heartbeat transmission
Network partition between Kafka broker and ZooKeeper ensemble
ZooKeeper session timeout set too low for the network latency between nodes
System load causing thread starvation on the broker, delaying ZooKeeper client thread
ZooKeeper ensemble overloaded, failing to process heartbeats in time

Step-by-Step Fix

1.Confirm ZooKeeper session expiration from broker logs: Check the ZooKeeper client logs.
2.```bash
3.grep -E "Session expired|Unable to reconnect|KeeperState" /var/log/kafka/server.log | tail -20
4.`
5.Increase ZooKeeper session timeout: Allow more time for heartbeats under load.
6.```properties
7.zookeeper.session.timeout.ms=18000
8.zookeeper.connection.timeout.ms=18000
9.`
10.Check for GC pauses on the broker: Identify if garbage collection caused the session loss.
11.```bash
12.# Check GC logs for long pauses
13.grep "GC pause" /var/log/kafka/gc.log | awk '{print $NF}' | sort -rn | head -10
14.`
15.Verify ZooKeeper ensemble health: Check ZooKeeper server latency and availability.
16.```bash
17.echo ruok | nc zk-1 2181
18.echo ruok | nc zk-2 2181
19.echo ruok | nc zk-3 2181
20.`
21.Restart the deregistered broker and verify partition reassignment: Ensure the broker rejoins cleanly.
22.```bash
23.systemctl restart kafka
24.kafka-topics.sh --bootstrap-server localhost:9092 --describe --under-replicated-partitions
25.`

Prevention

Set zookeeper.session.timeout.ms to at least 18000ms (18 seconds) to tolerate temporary network hiccups
Use G1GC with appropriate heap sizing to minimize stop-the-world GC pauses
Deploy ZooKeeper ensemble on the same network as Kafka brokers to minimize latency
Monitor ZooKeeper session state and alert on Expired or Disconnected events
Consider migrating to Kafka KRaft mode (KRaft replaces ZooKeeper) for Kafka 3.3+ deployments
Monitor broker GC pause duration and alert when p99 exceeds 500ms

Kafka ZooKeeper Session Expired Broker Deregistered Unexpectedly

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Prevention

Share this guide

More Kafka Troubleshooting Guides

Kafka Schema Registry Backward Compatibility Check Rejecting New Version

Kafka Consumer Offset Commit Failed Group Coordinator Not Available

Kafka SASL SCRAM Authentication Failed During Credentials Rotation

Kafka Compacted Topic Log Cleanup Removing Active Keys

Kafka Producer Idempotence Lost After Broker Crash

Kafka ISR Shrinking Due to Slow Follower Replication Lag