The Problem
Prometheus stops ingesting data and logs show disk-related errors:
level=error ts=2026-04-04T08:30:15.234Z caller=db.go:1245 msg="disk full, cannot write segment"
level=error ts=2026-04-04T08:30:15.235Z caller=head_wal.go:892 msg="WAL write failed" err="write /data/wal/00012345: no space left on device"
level=fatal ts=2026-04-04T08:30:15.236Z caller=main.go:456 err="TSDB closed"The disk fills up because Prometheus stores time-series data locally, and without proper retention limits, the storage grows indefinitely.
Diagnosis
Check Disk Usage
```bash # Check Prometheus data directory df -h /var/lib/prometheus
# Check size of each component du -sh /var/lib/prometheus/* du -sh /var/lib/prometheus/wal du -sh /var/lib/prometheus/chunks_head ```
Prometheus Metrics for Storage
```promql # Current disk usage prometheus_tsdb_storage_blocks_bytes{job="prometheus"}
# Total series count prometheus_tsdb_head_series{job="prometheus"}
# WAL size prometheus_tsdb_wal_storage_size_bytes{job="prometheus"}
# Retention status prometheus_tsdb_time_retention_seconds{job="prometheus"} ```
Identify Large Metrics
```promql # Top metrics by disk usage (approximate) topk(10, count by (__name__)({__name__=~".+"}))
# High cardinality labels topk(10, count_values by (value) ("label_name", {__name__=~".+"})) ```
Solutions
1. Configure Retention Limits
Set time-based retention (recommended starting point):
prometheus \
--storage.tsdb.retention.time=15d \
--storage.tsdb.retention.size=50GBOr configure in systemd:
# /etc/systemd/system/prometheus.service
[Service]
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus \
--storage.tsdb.retention.time=15d \
--storage.tsdb.retention.size=50GBRestart the service:
sudo systemctl daemon-reload
sudo systemctl restart prometheus2. Enable Size-Based Retention
Combine time and size limits for safety:
prometheus \
--storage.tsdb.retention.time=30d \
--storage.tsdb.retention.size=100GBSize-based retention takes precedence when both are set.
3. Immediate Cleanup
If disk is already full, you can safely delete old blocks:
```bash # Stop Prometheus first sudo systemctl stop prometheus
# List blocks ls -la /var/lib/prometheus/
# Blocks are directories with numeric names like 01HXYZ... # Delete oldest blocks (check timestamps) sudo rm -rf /var/lib/prometheus/01HABC... # oldest block
# Or truncate WAL (last resort) rm -rf /var/lib/prometheus/wal/*
# Start Prometheus sudo systemctl start prometheus ```
4. Reduce Ingestion Rate
Lower scrape frequency for non-critical targets:
```yaml # prometheus.yml global: scrape_interval: 30s # increased from 15s
scrape_configs: - job_name: 'critical-services' scrape_interval: 10s
- job_name: 'non-critical'
- scrape_interval: 60s
`
5. Optimize Compaction
Speed up block compaction to reclaim space faster:
prometheus \
--storage.tsdb.min-block-duration=2h \
--storage.tsdb.max-block-duration=6h \
--storage.tsdb.retention.time=15d6. Move Storage to Larger Disk
```bash # Stop Prometheus sudo systemctl stop prometheus
# Copy data to new location sudo rsync -avz /var/lib/prometheus/ /mnt/larger-disk/prometheus/
# Update config prometheus --storage.tsdb.path=/mnt/larger-disk/prometheus
# Or update systemd service file ```
Verification
Check storage is properly managed:
```promql # Storage size should be below limit prometheus_tsdb_storage_blocks_bytes{job="prometheus"} / (100 * 1024 * 1024 * 1024) < 0.8
# Retention working correctly prometheus_tsdb_time_retention_seconds{job="prometheus"} > 0 ```
Verify in logs:
```bash # Check for compaction success journalctl -u prometheus --since "1 hour ago" | grep -i compaction
# Check retention is working journalctl -u prometheus --since "1 hour ago" | grep -i retention ```
Prevention
Set up alerts for disk usage:
```yaml # alert_rules.yml groups: - name: prometheus_storage rules: - alert: PrometheusStorageNearFull expr: prometheus_tsdb_storage_blocks_bytes / prometheus_tsdb_retention_size_bytes > 0.8 for: 5m labels: severity: warning annotations: summary: "Prometheus storage approaching limit"
- alert: PrometheusDiskSpaceLow
- expr: node_filesystem_avail_bytes{mountpoint="/var/lib/prometheus"} / node_filesystem_size_bytes{mountpoint="/var/lib/prometheus"} < 0.15
- for: 5m
- labels:
- severity: critical
- annotations:
- summary: "Disk space low on Prometheus storage"
`
Monitor storage growth rate:
```promql # Rate of storage growth rate(prometheus_tsdb_storage_blocks_bytes[1h])
# Estimated days until full node_filesystem_avail_bytes{mountpoint="/var/lib/prometheus"} / rate(prometheus_tsdb_storage_blocks_bytes[1h]) / 86400 ```