Introduction
Message brokers with deduplication support use producer-supplied deduplication IDs to filter duplicate messages. When different events accidentally share the same deduplication ID, the broker silently discards legitimate messages as duplicates, causing data loss that is difficult to detect without dedicated monitoring.
Symptoms
- Event counts at consumers are lower than expected with no error messages
- Deduplication ID generation uses a counter or timestamp with insufficient granularity
- Logs show duplicate detection hits for events that have different payloads
- Replayed events from upstream systems share deduplication IDs with original events
- Metrics show high deduplication drop rate relative to total message throughput
Common Causes
- Deduplication ID generated from timestamp with millisecond precision, causing collisions under high throughput
- Producer reuses deduplication IDs across different logical events during retry logic
- UUID generation seeded with insufficient entropy, producing collisions at scale
- Message replay or replay-from-checkpoint reuses original deduplication IDs without suffix
- Shared deduplication ID namespace across multiple producer instances without instance identifier
Step-by-Step Fix
- 1.Audit the deduplication ID generation logic: Review how the producer creates deduplication IDs.
- 2.```java
- 3.// WRONG: timestamp-based ID collides under high throughput
- 4.String dedupId = "evt-" + System.currentTimeMillis();
// CORRECT: UUID-based ID with producer instance prefix String dedupId = "evt-" + producerId + "-" + UUID.randomUUID(); ```
- 1.Add producer instance identifier to deduplication ID namespace: Ensure each producer instance uses a unique namespace.
- 2.```bash
- 3.PRODUCER_ID=$(hostname)-$$
- 4.export PRODUCER_ID
- 5.
` - 6.Increase the deduplication window on the broker: Extend the time window during which duplicates are tracked.
- 7.```properties
- 8.# For Amazon SQS or similar
- 9.deduplication.scope=queue
- 10.deduplication.retention.period=300
- 11.
` - 12.Verify no events are being silently dropped: Compare producer send counts with consumer receive counts.
- 13.```bash
- 14.# Compare sent vs received counts
- 15.echo "Sent: $(grep 'message.sent' producer.log | wc -l)"
- 16.echo "Received: $(grep 'message.received' consumer.log | wc -l)"
- 17.echo "Deduped: $(grep 'duplicate.detected' broker.log | wc -l)"
- 18.
` - 19.Implement deduplication ID logging for audit: Log every deduplication ID with its event type for traceability.
- 20.```java
- 21.logger.info("DedupId={} EventType={} EventId={}", dedupId, eventType, eventId);
- 22.
`
Prevention
- Use cryptographically strong random identifiers (UUIDv4 or ULID) for deduplication IDs, never timestamps alone
- Include a producer instance identifier, logical event type, and sequence number in the deduplication ID
- Monitor deduplication drop rates and alert when the rate exceeds expected retry duplication levels
- Implement end-to-end event counting with a reconciliation job that compares source and sink event counts
- Test deduplication behavior under high-throughput replay scenarios in staging environments