Introduction

Kafka log compaction retains only the latest value for each message key, deleting older records. Misconfiguration of compaction parameters can cause the cleaner to incorrectly remove segments that contain the latest (and only) values for some keys, resulting in permanent data loss for those keys.

Symptoms

  • Consumers reading compacted topics find missing keys that should still exist
  • Key-value store rebuilt from compacted topic has fewer entries than expected
  • Compaction logs show segments being cleaned that contain only one record per key
  • After compaction, kafka-dump-log shows keys missing from the topic
  • Error message: Key not found in compacted log for entity ID

Common Causes

  • min.cleanable.dirty.ratio set too low, triggering aggressive compaction before all data is flushed
  • Tombstone messages (null values) produced for keys that still need their latest value retained
  • delete.retention.ms set too short, removing tombstones before consumers have processed them
  • Compaction running on a topic that should use delete cleanup policy instead
  • Producer not sending proper key values, causing all messages to hash to the same key and get compacted

Step-by-Step Fix

  1. 1.Verify the cleanup policy is set correctly: Confirm the topic uses compaction.
  2. 2.```bash
  3. 3.kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic my-compacted-topic
  4. 4.`
  5. 5.Adjust compaction parameters to be less aggressive: Reduce the compaction rate.
  6. 6.```properties
  7. 7.min.cleanable.dirty.ratio=0.5
  8. 8.delete.retention.ms=86400000
  9. 9.min.compaction.lag.ms=60000
  10. 10.`
  11. 11.Audit tombstone production logic: Ensure tombstones are only produced when keys should actually be deleted.
  12. 12.```java
  13. 13.// Only produce tombstone when the entity is genuinely deleted
  14. 14.if (entity.isDeleted()) {
  15. 15.producer.send(new ProducerRecord<>("my-topic", entity.getKey(), null));
  16. 16.}
  17. 17.`
  18. 18.Recover missing data from the source system: Republish the latest values for lost keys.
  19. 19.```bash
  20. 20.# Republish from source database
  21. 21.kafka-console-producer.sh --bootstrap-server localhost:9092 --topic my-compacted-topic \
  22. 22.--property "parse.key=true" --property "key.separator=:" < recovered-keys.txt
  23. 23.`
  24. 24.Verify data integrity after recovery: Compare key counts before and after.
  25. 25.```bash
  26. 26.kafka-run-class.sh kafka.tools.DumpLogSegments \
  27. 27.--files /var/lib/kafka/data/my-compacted-topic-0/*.log \
  28. 28.--print-data-log | grep -c "key="
  29. 29.`

Prevention

  • Set min.cleanable.dirty.ratio to 0.5 or higher to prevent overly aggressive compaction
  • Configure delete.retention.ms to at least 24 hours to give consumers time to process tombstones
  • Use min.compaction.lag.ms to ensure messages are not compacted immediately after production
  • Regularly audit compacted topic key counts against the source of truth
  • Monitor compaction rate and log a warning if key count drops significantly between compaction cycles
  • Test compaction behavior in staging with production-like key distribution before applying to production topics