Introduction

Kafka consumers commit their processed offsets to the __consumer_offsets internal topic, managed by a designated group coordinator broker. When this broker becomes unavailable, offset commits fail, and consumers continue processing without recording their progress. If the consumer restarts before the coordinator recovers, it will reprocess all messages since the last successful commit.

Symptoms

  • Consumer logs show CommitFailedException: Group coordinator not available
  • Offset commit latency increases to timeout levels
  • __consumer_offsets topic partition leader is unavailable
  • Consumers continue processing but offsets are not persisted
  • On consumer restart, messages are reprocessed from the last committed offset

Common Causes

  • Group coordinator broker crashed or was taken offline for maintenance
  • __consumer_offsets topic partition leader election failed
  • Network partition between consumers and the coordinator broker
  • Broker overload causing coordinator request processing to timeout
  • Consumer group coordinator migration during rebalance taking too long

Step-by-Step Fix

  1. 1.Identify the group coordinator for the affected consumer group: Find which broker is the coordinator.
  2. 2.```bash
  3. 3.kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  4. 4.--describe --group my-consumer-group
  5. 5.`
  6. 6.Check coordinator broker health: Verify the coordinator broker is running and responsive.
  7. 7.```bash
  8. 8.# Find coordinator broker ID
  9. 9.kafka-metadata.sh --snapshot /var/lib/kafka/data/__consumer_offsets-0/00000000000000000000.log | head -5
  10. 10.# Check broker status
  11. 11.kafka-broker-api-versions.sh --bootstrap-server coordinator-broker:9092
  12. 12.`
  13. 13.Restart the coordinator broker if it is down: Restore the coordinator service.
  14. 14.```bash
  15. 15.systemctl restart kafka
  16. 16.# Wait for broker to rejoin cluster
  17. 17.kafka-broker-api-versions.sh --bootstrap-server localhost:9092 | grep "coordinator-broker"
  18. 18.`
  19. 19.Manually commit offsets once coordinator is available: Force an offset commit to recover progress tracking.
  20. 20.```bash
  21. 21.# Trigger offset commit via consumer group management
  22. 22.kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  23. 23.--group my-consumer-group --reset-offsets --to-current --execute
  24. 24.`
  25. 25.Configure synchronous offset commits for critical consumers: Ensure offset commits block processing.
  26. 26.```java
  27. 27.// Use commitSync instead of commitAsync for critical processing
  28. 28.consumer.commitSync();
  29. 29.`

Prevention

  • Configure offsets.topic.replication.factor=3 to ensure __consumer_offsets topic is highly available
  • Monitor group coordinator availability and alert on coordinator changes
  • Use enable.auto.commit=false with explicit commitSync() for critical processing pipelines
  • Implement offset tracking in an external store (database) as a backup to Kafka's internal offsets
  • Set offsets.retention.minutes to at least 43200 (30 days) for production consumer groups
  • Distribute __consumer_offsets partition leaders across multiple brokers