Introduction
Kafka log compaction retains only the latest value for each message key, deleting older records. Misconfiguration of compaction parameters can cause the cleaner to incorrectly remove segments that contain the latest (and only) values for some keys, resulting in permanent data loss for those keys.
Symptoms
- Consumers reading compacted topics find missing keys that should still exist
- Key-value store rebuilt from compacted topic has fewer entries than expected
- Compaction logs show segments being cleaned that contain only one record per key
- After compaction,
kafka-dump-logshows keys missing from the topic - Error message:
Key not found in compacted log for entity ID
Common Causes
min.cleanable.dirty.ratioset too low, triggering aggressive compaction before all data is flushed- Tombstone messages (null values) produced for keys that still need their latest value retained
delete.retention.msset too short, removing tombstones before consumers have processed them- Compaction running on a topic that should use delete cleanup policy instead
- Producer not sending proper key values, causing all messages to hash to the same key and get compacted
Step-by-Step Fix
- 1.Verify the cleanup policy is set correctly: Confirm the topic uses compaction.
- 2.```bash
- 3.kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic my-compacted-topic
- 4.
` - 5.Adjust compaction parameters to be less aggressive: Reduce the compaction rate.
- 6.```properties
- 7.min.cleanable.dirty.ratio=0.5
- 8.delete.retention.ms=86400000
- 9.min.compaction.lag.ms=60000
- 10.
` - 11.Audit tombstone production logic: Ensure tombstones are only produced when keys should actually be deleted.
- 12.```java
- 13.// Only produce tombstone when the entity is genuinely deleted
- 14.if (entity.isDeleted()) {
- 15.producer.send(new ProducerRecord<>("my-topic", entity.getKey(), null));
- 16.}
- 17.
` - 18.Recover missing data from the source system: Republish the latest values for lost keys.
- 19.```bash
- 20.# Republish from source database
- 21.kafka-console-producer.sh --bootstrap-server localhost:9092 --topic my-compacted-topic \
- 22.--property "parse.key=true" --property "key.separator=:" < recovered-keys.txt
- 23.
` - 24.Verify data integrity after recovery: Compare key counts before and after.
- 25.```bash
- 26.kafka-run-class.sh kafka.tools.DumpLogSegments \
- 27.--files /var/lib/kafka/data/my-compacted-topic-0/*.log \
- 28.--print-data-log | grep -c "key="
- 29.
`
Prevention
- Set
min.cleanable.dirty.ratioto 0.5 or higher to prevent overly aggressive compaction - Configure
delete.retention.msto at least 24 hours to give consumers time to process tombstones - Use
min.compaction.lag.msto ensure messages are not compacted immediately after production - Regularly audit compacted topic key counts against the source of truth
- Monitor compaction rate and log a warning if key count drops significantly between compaction cycles
- Test compaction behavior in staging with production-like key distribution before applying to production topics