Fix Prometheus Storage Disk Full

The Problem

Prometheus stops ingesting data and logs show disk-related errors:

bash

level=error ts=2026-04-04T08:30:15.234Z caller=db.go:1245 msg="disk full, cannot write segment"
level=error ts=2026-04-04T08:30:15.235Z caller=head_wal.go:892 msg="WAL write failed" err="write /data/wal/00012345: no space left on device"
level=fatal ts=2026-04-04T08:30:15.236Z caller=main.go:456 err="TSDB closed"

The disk fills up because Prometheus stores time-series data locally, and without proper retention limits, the storage grows indefinitely.

Diagnosis

Check Disk Usage

```bash # Check Prometheus data directory df -h /var/lib/prometheus

# Check size of each component du -sh /var/lib/prometheus/* du -sh /var/lib/prometheus/wal du -sh /var/lib/prometheus/chunks_head ```

Prometheus Metrics for Storage

```promql # Current disk usage prometheus_tsdb_storage_blocks_bytes{job="prometheus"}

# Total series count prometheus_tsdb_head_series{job="prometheus"}

# WAL size prometheus_tsdb_wal_storage_size_bytes{job="prometheus"}

# Retention status prometheus_tsdb_time_retention_seconds{job="prometheus"} ```

Identify Large Metrics

```promql # Top metrics by disk usage (approximate) topk(10, count by (__name__)({__name__=~".+"}))

# High cardinality labels topk(10, count_values by (value) ("label_name", {__name__=~".+"})) ```

Solutions

1. Configure Retention Limits

Set time-based retention (recommended starting point):

bash

prometheus \
  --storage.tsdb.retention.time=15d \
  --storage.tsdb.retention.size=50GB

Or configure in systemd:

ini

# /etc/systemd/system/prometheus.service
[Service]
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --storage.tsdb.retention.time=15d \
  --storage.tsdb.retention.size=50GB

Restart the service:

bash

sudo systemctl daemon-reload
sudo systemctl restart prometheus

2. Enable Size-Based Retention

Combine time and size limits for safety:

bash

prometheus \
  --storage.tsdb.retention.time=30d \
  --storage.tsdb.retention.size=100GB

Size-based retention takes precedence when both are set.

3. Immediate Cleanup

If disk is already full, you can safely delete old blocks:

```bash # Stop Prometheus first sudo systemctl stop prometheus

# List blocks ls -la /var/lib/prometheus/

# Blocks are directories with numeric names like 01HXYZ... # Delete oldest blocks (check timestamps) sudo rm -rf /var/lib/prometheus/01HABC... # oldest block

# Or truncate WAL (last resort) rm -rf /var/lib/prometheus/wal/*

# Start Prometheus sudo systemctl start prometheus ```

4. Reduce Ingestion Rate

Lower scrape frequency for non-critical targets:

```yaml # prometheus.yml global: scrape_interval: 30s # increased from 15s

scrape_configs: - job_name: 'critical-services' scrape_interval: 10s

job_name: 'non-critical'
scrape_interval: 60s
`

5. Optimize Compaction

Speed up block compaction to reclaim space faster:

bash

prometheus \
  --storage.tsdb.min-block-duration=2h \
  --storage.tsdb.max-block-duration=6h \
  --storage.tsdb.retention.time=15d

6. Move Storage to Larger Disk

```bash # Stop Prometheus sudo systemctl stop prometheus

# Copy data to new location sudo rsync -avz /var/lib/prometheus/ /mnt/larger-disk/prometheus/

# Update config prometheus --storage.tsdb.path=/mnt/larger-disk/prometheus

# Or update systemd service file ```

Verification

Check storage is properly managed:

```promql # Storage size should be below limit prometheus_tsdb_storage_blocks_bytes{job="prometheus"} / (100 * 1024 * 1024 * 1024) < 0.8

# Retention working correctly prometheus_tsdb_time_retention_seconds{job="prometheus"} > 0 ```

Verify in logs:

```bash # Check for compaction success journalctl -u prometheus --since "1 hour ago" | grep -i compaction

# Check retention is working journalctl -u prometheus --since "1 hour ago" | grep -i retention ```

Prevention

Set up alerts for disk usage:

```yaml # alert_rules.yml groups: - name: prometheus_storage rules: - alert: PrometheusStorageNearFull expr: prometheus_tsdb_storage_blocks_bytes / prometheus_tsdb_retention_size_bytes > 0.8 for: 5m labels: severity: warning annotations: summary: "Prometheus storage approaching limit"

alert: PrometheusDiskSpaceLow
expr: node_filesystem_avail_bytes{mountpoint="/var/lib/prometheus"} / node_filesystem_size_bytes{mountpoint="/var/lib/prometheus"} < 0.15
for: 5m
labels:
severity: critical
annotations:
summary: "Disk space low on Prometheus storage"
`

Monitor storage growth rate:

```promql # Rate of storage growth rate(prometheus_tsdb_storage_blocks_bytes[1h])

# Estimated days until full node_filesystem_avail_bytes{mountpoint="/var/lib/prometheus"} / rate(prometheus_tsdb_storage_blocks_bytes[1h]) / 86400 ```

The Problem

Diagnosis

Check Disk Usage

Prometheus Metrics for Storage

Identify Large Metrics

Solutions

1. Configure Retention Limits

2. Enable Size-Based Retention

3. Immediate Cleanup

4. Reduce Ingestion Rate

5. Optimize Compaction

6. Move Storage to Larger Disk

Verification

Prevention

Share this guide

More Monitoring Troubleshooting Guides

Metric Retention Expired

Timeseries Storage Full

Collector Agent Crashed

Webhook Notification Timeout

SMS Notification Failed

Email Notification Bounced