Fix Kafka Consumer Group Rebalancing Processing Pause

Introduction

Kafka consumer group rebalancing redistributes partition ownership among consumers when members join, leave, or fail. During rebalance, all consumers in the group pause processing, creating a throughput gap. When rebalances happen frequently or take too long, the cumulative pause time significantly impacts message processing latency and consumer lag.

Symptoms

Consumer processing throughput drops to zero periodically with no apparent cause
Kafka consumer group state oscillates between Stable and PreparingRebalance
Consumer logs show The coordinator is not aware of this member or Rebalance in progress
Consumer lag grows steadily despite adequate consumer capacity
Application health checks fail during extended rebalance windows

Common Causes

max.poll.interval.ms set too low, causing slow consumers to be kicked out and trigger rebalance
Consumer processing takes longer than the poll interval during peak message volume
Network instability between consumers and the group coordinator causing heartbeat failures
Rolling deployments removing and adding consumers without static group membership
GC pauses in JVM-based consumers exceeding session timeout thresholds

Step-by-Step Fix

1.Identify rebalance frequency and duration from broker logs: Check how often rebalances occur.
2.```bash
3.grep "Rebalance" /var/log/kafka/server.log | awk '{print $1, $2, $NF}' | tail -50
4.`
5.Increase max.poll.interval.ms to accommodate processing time: Ensure consumers have enough time to process their assigned batch.
6.```properties
7.max.poll.interval.ms=600000
8.max.poll.records=500
9.session.timeout.ms=45000
10.heartbeat.interval.ms=15000
11.`
12.Enable static group membership to avoid rebalances during rolling restarts: Use a stable group instance ID.
13.```properties
14.group.instance.id=consumer-1
15.`
16.Switch to cooperative sticky partition assignment: Minimize partition movement during rebalances.
17.```properties
18.partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor
19.`
20.Verify rebalance stabilization: Monitor group state after applying changes.
21.```bash
22.kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
23.--describe --group my-consumer-group --state
24.`

Prevention

Size consumer poll intervals based on p99 processing time, not average
Use static group membership (group.instance.id) for all production consumer deployments
Implement graceful consumer shutdown that commits offsets before leaving the group
Monitor rebalance frequency as a key SLO metric, alerting when it exceeds baseline
Use incremental cooperative rebalancing protocol (available since Kafka 2.3) to reduce pause duration

Kafka Consumer Group Rebalancing Causing Processing Pause

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Prevention

Share this guide

More Kafka Troubleshooting Guides

Kafka Schema Registry Backward Compatibility Check Rejecting New Version

Kafka Consumer Offset Commit Failed Group Coordinator Not Available

Kafka SASL SCRAM Authentication Failed During Credentials Rotation

Kafka Compacted Topic Log Cleanup Removing Active Keys

Kafka Producer Idempotence Lost After Broker Crash

Kafka ISR Shrinking Due to Slow Follower Replication Lag