The Problem

Prometheus stops ingesting data and logs show disk-related errors:

bash
level=error ts=2026-04-04T08:30:15.234Z caller=db.go:1245 msg="disk full, cannot write segment"
level=error ts=2026-04-04T08:30:15.235Z caller=head_wal.go:892 msg="WAL write failed" err="write /data/wal/00012345: no space left on device"
level=fatal ts=2026-04-04T08:30:15.236Z caller=main.go:456 err="TSDB closed"

The disk fills up because Prometheus stores time-series data locally, and without proper retention limits, the storage grows indefinitely.

Diagnosis

Check Disk Usage

```bash # Check Prometheus data directory df -h /var/lib/prometheus

# Check size of each component du -sh /var/lib/prometheus/* du -sh /var/lib/prometheus/wal du -sh /var/lib/prometheus/chunks_head ```

Prometheus Metrics for Storage

```promql # Current disk usage prometheus_tsdb_storage_blocks_bytes{job="prometheus"}

# Total series count prometheus_tsdb_head_series{job="prometheus"}

# WAL size prometheus_tsdb_wal_storage_size_bytes{job="prometheus"}

# Retention status prometheus_tsdb_time_retention_seconds{job="prometheus"} ```

Identify Large Metrics

```promql # Top metrics by disk usage (approximate) topk(10, count by (__name__)({__name__=~".+"}))

# High cardinality labels topk(10, count_values by (value) ("label_name", {__name__=~".+"})) ```

Solutions

1. Configure Retention Limits

Set time-based retention (recommended starting point):

bash
prometheus \
  --storage.tsdb.retention.time=15d \
  --storage.tsdb.retention.size=50GB

Or configure in systemd:

ini
# /etc/systemd/system/prometheus.service
[Service]
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --storage.tsdb.retention.time=15d \
  --storage.tsdb.retention.size=50GB

Restart the service:

bash
sudo systemctl daemon-reload
sudo systemctl restart prometheus

2. Enable Size-Based Retention

Combine time and size limits for safety:

bash
prometheus \
  --storage.tsdb.retention.time=30d \
  --storage.tsdb.retention.size=100GB

Size-based retention takes precedence when both are set.

3. Immediate Cleanup

If disk is already full, you can safely delete old blocks:

```bash # Stop Prometheus first sudo systemctl stop prometheus

# List blocks ls -la /var/lib/prometheus/

# Blocks are directories with numeric names like 01HXYZ... # Delete oldest blocks (check timestamps) sudo rm -rf /var/lib/prometheus/01HABC... # oldest block

# Or truncate WAL (last resort) rm -rf /var/lib/prometheus/wal/*

# Start Prometheus sudo systemctl start prometheus ```

4. Reduce Ingestion Rate

Lower scrape frequency for non-critical targets:

```yaml # prometheus.yml global: scrape_interval: 30s # increased from 15s

scrape_configs: - job_name: 'critical-services' scrape_interval: 10s

  • job_name: 'non-critical'
  • scrape_interval: 60s
  • `

5. Optimize Compaction

Speed up block compaction to reclaim space faster:

bash
prometheus \
  --storage.tsdb.min-block-duration=2h \
  --storage.tsdb.max-block-duration=6h \
  --storage.tsdb.retention.time=15d

6. Move Storage to Larger Disk

```bash # Stop Prometheus sudo systemctl stop prometheus

# Copy data to new location sudo rsync -avz /var/lib/prometheus/ /mnt/larger-disk/prometheus/

# Update config prometheus --storage.tsdb.path=/mnt/larger-disk/prometheus

# Or update systemd service file ```

Verification

Check storage is properly managed:

```promql # Storage size should be below limit prometheus_tsdb_storage_blocks_bytes{job="prometheus"} / (100 * 1024 * 1024 * 1024) < 0.8

# Retention working correctly prometheus_tsdb_time_retention_seconds{job="prometheus"} > 0 ```

Verify in logs:

```bash # Check for compaction success journalctl -u prometheus --since "1 hour ago" | grep -i compaction

# Check retention is working journalctl -u prometheus --since "1 hour ago" | grep -i retention ```

Prevention

Set up alerts for disk usage:

```yaml # alert_rules.yml groups: - name: prometheus_storage rules: - alert: PrometheusStorageNearFull expr: prometheus_tsdb_storage_blocks_bytes / prometheus_tsdb_retention_size_bytes > 0.8 for: 5m labels: severity: warning annotations: summary: "Prometheus storage approaching limit"

  • alert: PrometheusDiskSpaceLow
  • expr: node_filesystem_avail_bytes{mountpoint="/var/lib/prometheus"} / node_filesystem_size_bytes{mountpoint="/var/lib/prometheus"} < 0.15
  • for: 5m
  • labels:
  • severity: critical
  • annotations:
  • summary: "Disk space low on Prometheus storage"
  • `

Monitor storage growth rate:

```promql # Rate of storage growth rate(prometheus_tsdb_storage_blocks_bytes[1h])

# Estimated days until full node_filesystem_avail_bytes{mountpoint="/var/lib/prometheus"} / rate(prometheus_tsdb_storage_blocks_bytes[1h]) / 86400 ```