Introduction

Prometheus has two retention controls: time-based (--storage.tsdb.retention.time) and size-based (--storage.tsdb.retention.size). When disk usage continues to grow past the configured retention period, it usually indicates that only one retention type is set, compaction is failing to create deletable blocks, or the retention check interval has not yet triggered.

Symptoms

  • Disk usage exceeds expected size based on retention configuration
  • Old TSDB blocks remain in the data directory past the retention period
  • Prometheus logs show no block deletion messages during compaction cycles
  • prometheus_tsdb_storage_blocks_bytes continues to grow despite retention settings
  • Error message: Retention period not yet reached, skipping block deletion

Common Causes

  • Only time-based retention configured without size-based retention limit
  • Retention period shorter than the minimum block duration (2 hours), making it ineffective
  • Compaction failing to merge head chunks into blocks, preventing retention cleanup
  • --storage.tsdb.min-block-duration set longer than the retention period
  • Disk shared with other processes that are filling it independently of Prometheus

Step-by-Step Fix

  1. 1.Verify the current retention configuration: Check what retention settings are active.
  2. 2.```bash
  3. 3.curl -s http://localhost:9090/api/v1/status/flags | jq '.["storage.tsdb.retention.time", "storage.tsdb.retention.size"]'
  4. 4.`
  5. 5.Add size-based retention as a safety net: Cap disk usage regardless of time.
  6. 6.```bash
  7. 7.# Add to Prometheus startup flags
  8. 8.--storage.tsdb.retention.time=15d
  9. 9.--storage.tsdb.retention.size=50GB
  10. 10.`
  11. 11.Check block ages and sizes: Verify blocks are being created and aged correctly.
  12. 12.```bash
  13. 13.ls -lh /var/lib/prometheus/metrics2/01* | awk '{print $6, $7, $8, $5}'
  14. 14.`
  15. 15.Force compaction to create deletable blocks: Trigger manual compaction.
  16. 16.```bash
  17. 17.curl -X POST http://localhost:9090/api/v1/admin/tsdb/snapshot
  18. 18.`
  19. 19.Verify disk usage stabilizes after retention applies: Monitor block deletion.
  20. 20.```bash
  21. 21.# After the next compaction cycle
  22. 22.du -sh /var/lib/prometheus/metrics2
  23. 23.# Check logs for block deletion
  24. 24.journalctl -u prometheus | grep "Deleting obsolete block"
  25. 25.`

Prevention

  • Always set both --storage.tsdb.retention.time and --storage.tsdb.retention.size
  • Set size-based retention to approximately 80% of available disk capacity
  • Ensure retention time is at least 4 hours (2x the minimum block duration of 2 hours)
  • Monitor prometheus_tsdb_storage_blocks_bytes and alert when it approaches the size retention limit
  • Check compaction health regularly to ensure blocks are being created and aged properly
  • Document the expected disk growth rate and compare against actual usage monthly