Fix Consumer Offset Reset to Earliest Causing Duplicate Processing

Introduction

When a consumer group loses its committed offsets -- due to offset expiration, manual reset, or configuration change -- setting auto.offset.reset=earliest causes the consumer to replay all available messages from the beginning of the log. If downstream systems are not idempotent, this replay creates duplicate records, corrupted aggregates, and incorrect business state.

Symptoms

Downstream database contains duplicate records after consumer restart
Aggregate metrics show sudden spike then drop as duplicates inflate then normalize counts
Consumer logs show processing of events with timestamps far in the past
Idempotency keys conflict causing constraint violation errors in downstream databases
Error message: Duplicate key entry or UNIQUE constraint violation

Common Causes

Consumer group offsets expired due to offsets.retention.minutes being too short
Manual offset reset command run with --reset-to-earliest instead of --reset-to-latest
New consumer group joining an existing topic with auto.offset.reset=earliest
Kafka consumer group coordinator failover causing offset commit loss
Offset topic itself compacted or deleted due to misconfigured retention

Step-by-Step Fix

1.Check current committed offsets before any reset: Document the current state to enable recovery.
2.```bash
3.kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
4.--describe --group my-consumer-group --offsets
5.`
6.Stop the consumer to prevent further duplicate processing: Halt processing immediately.
7.```bash
8.kubectl scale deployment/consumer-service --replicas=0
9.`
10.Reset offsets to the latest position to resume from current head: Avoid replaying all historical messages.
11.```bash
12.kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
13.--group my-consumer-group --topic my-topic --reset-offsets --to-latest --execute
14.`
15.Clean up duplicate data in downstream systems: Run a deduplication query to remove duplicate records.
16.```sql
17.DELETE FROM events t1 USING events t2
18.WHERE t1.id < t2.id
19.AND t1.event_id = t2.event_id;
20.`
21.Restart the consumer and verify no further duplicates: Monitor the consumer for clean processing.
22.```bash
23.kubectl scale deployment/consumer-service --replicas=3
24.kubectl logs -l app=consumer-service --tail=100 | grep -i "duplicate|error"
25.`

Prevention

Set auto.offset.reset=latest by default and only use earliest for explicit recovery scenarios
Implement idempotent downstream processing using unique event IDs and database upserts
Extend offsets.retention.minutes to at least 30 days for critical consumer groups
Enable periodic offset backup to an external store for disaster recovery
Require confirmation prompts or approval workflows for manual offset reset operations in production

Message Consumer Offset Reset to Earliest Losing Processed Events

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Prevention

Share this guide

More Messaging Troubleshooting Guides

Pulsar Namespace Policy Denied

Pulsar Broker Connection Failed

Pulsar Producer Queue Full

Pulsar Subscription Not Found

Pulsar Topic Not Found

RocketMQ Delay Message Not Delivering