VictoriaMetrics is running out of storage space, causing data ingestion to slow down or fail. You're seeing disk space warnings or errors in logs, and metrics collection is impacted. Let's diagnose and fix storage capacity issues.
Understanding Storage Issues
VictoriaMetrics uses a compressed storage format but still requires adequate disk space. Storage problems manifest as:
- "Disk is nearly full" warnings
- Ingestion rate drops
- Query performance degradation
- "Out of space" errors in logs
- Missing recent metrics
Error patterns:
cannot create new partition: no space left on devicestorage: insufficient disk spacewrite failed: disk quota exceededInitial Diagnosis
Check storage status and metrics:
```bash # Check VictoriaMetrics logs docker logs victoriametrics | grep -i "space|disk|storage|error" # Or for systemd journalctl -u victoriametrics | grep -i "space|disk"
# Check disk usage df -h /victoriametrics-data
# Check VictoriaMetrics metrics endpoint curl -s http://localhost:8428/metrics | grep -E "vm_data|vm_storage"
# Get storage stats via API curl -s http://localhost:8428/api/v1/status | jq '.storage'
# Check index size curl -s http://localhost:8428/metrics | grep vm_index_size_bytes
# Check active time series count curl -s http://localhost:8428/api/v1/status/active_series | jq '.data'
# Check current retention curl -s http://localhost:8428/api/v1/status | jq '.retention' ```
Common Cause 1: Disk Space Exhaustion
The underlying disk has run out of space.
Error pattern:
``
cannot write data: no space left on device
Diagnosis:
```bash # Check disk usage df -h /victoriametrics-data
# Check VictoriaMetrics data directory size du -sh /victoriametrics-data du -sh /victoriametrics-data/*
# Check largest directories du -h --max-depth=2 /victoriametrics-data | sort -rh | head -20
# Check available disk df -k /victoriametrics-data | awk '{print $4}' | tail -1
# Check if other processes are using disk du -sh /* 2>/dev/null | sort -rh | head -10
# Monitor disk usage watch -n 5 'df -h /victoriametrics-data' ```
Solution:
Free disk space:
```bash # Option 1: Delete old VictoriaMetrics partitions # Check partition dates ls -la /victoriametrics-data/data/
# Remove old partitions (be careful) rm -rf /victoriametrics-data/data/2023-01/ rm -rf /victoriametrics-data/data/2023-02/
# Option 2: Reduce retention period # Restart with shorter retention victoriametrics \ -storageDataPath=/victoriametrics-data \ -retentionPeriod=30d # Reduce from default
# For container docker run victoriametrics/victoria-metrics:latest \ -retentionPeriod=30d \ -storageDataPath=/data
# Option 3: Clean up other disk usage # Remove logs, temporary files rm -rf /var/log/archive/*.log.gz apt-get clean yum clean all
# Option 4: Move to larger storage systemctl stop victoriametrics mv /victoriametrics-data /mnt/larger-storage/victoriametrics-data ln -s /mnt/larger-storage/victoriametrics-data /victoriametrics-data systemctl start victoriametrics ```
Common Cause 2: Retention Period Too Long
Long retention consumes excessive storage.
Error pattern:
``
storage growing beyond configured limits
Diagnosis:
```bash # Check current retention curl -s http://localhost:8428/api/v1/status | jq '.retention'
# Check storage growth rate curl -s http://localhost:8428/metrics | \ grep -E "vm_data_ingested_bytes_total|vm_rows_ingested_total"
# Calculate daily storage growth # Compare storage size from yesterday to today du -sh /victoriametrics-data
# Estimate storage needs curl -s http://localhost:8428/metrics | grep vm_active_time_series | head -5 ```
Solution:
Reduce retention period:
```bash # Set appropriate retention victoriametrics \ -storageDataPath=/victoriametrics-data \ -retentionPeriod=30d
# For different retention per namespace (vmselect) # Use -retentionPeriod=30d,20d:metrics,7d:logs
# Monitor after change watch -n 60 'du -sh /victoriametrics-data'
# Force immediate cleanup of old data # Restart VictoriaMetrics to trigger cleanup systemctl restart victoriametrics ```
Common Cause 3: High Cardinality Time Series
Too many unique time series consume excessive storage.
Error pattern:
``
high cardinality detected: millions of unique series
Diagnosis:
```bash # Check active series count curl -s http://localhost:8428/api/v1/status/active_series | jq '.data'
# Get top series by cardinality curl -s 'http://localhost:8428/api/v1/export?match[]={__name__=~".+"}&limit=1000' | \ grep -o '{[^}]*}' | sort | uniq -c | sort -rn | head -20
# Check metrics with high labels curl -s 'http://localhost:8428/api/v1/labels/__name__/values' | jq '.data[]' | head -50
# Find high-cardinality labels curl -s 'http://localhost:8428/api/v1/labels' | jq '.data'
# Check cardinality per metric curl -s http://localhost:8428/metrics | grep vm_active_time_series
# View series churn rate curl -s http://localhost:8428/metrics | grep vm_series_added_total ```
Solution:
Reduce cardinality:
```bash # Identify problematic metrics curl -s 'http://localhost:8428/api/v1/query?query=sort_desc(count by (__name__) ({__name__=~".+"}))' | \ jq '.data.result[] | {metric: .metric.__name__, count: .value[1]}' | \ sort -t'"' -k4 -rn | head -20
# Drop high-cardinality metrics at ingestion # In Prometheus scrape config metric_relabel_configs: - source_labels: [__name__] regex: 'high_cardinality_metric.*' action: drop
# Or use VictoriaMetrics relabeling # -dropLabels=pod_name,container_id for specific metrics
# Limit labels that cause cardinality curl -s 'http://localhost:8428/api/v1/labels' | jq '.data[]' | \ while read label; do count=$(curl -s "http://localhost:8428/api/v1/label/$label/values" | jq '.data | length') echo "$label: $count values" done | sort -t: -k2 -rn | head -20
# Configure cardinality limits victoriametrics \ -storageDataPath=/victoriametrics-data \ -maxSeriesPerMetric=10000 \ -maxLabelsPerSeries=20 ```
Common Cause 4: Index Bloat
VictoriaMetrics index has grown too large.
Error pattern:
``
index size exceeds memory limits
Diagnosis:
```bash # Check index size curl -s http://localhost:8428/metrics | grep vm_index_size_bytes
# Check memory usage ps aux | grep victoriametrics | awk '{print $6}'
# Check index metrics curl -s http://localhost:8428/metrics | \ grep -E "vm_index|vm_memory"
# Compare index to data size du -sh /victoriametrics-data/indexdb du -sh /victoriametrics-data/data ```
Solution:
Reduce index size and increase memory:
```bash # Drop unnecessary metrics to reduce index curl -s 'http://localhost:8428/api/v1/delete?match[]={__name__=~"debug_.*"}'
# Force index optimization # Restart VictoriaMetrics systemctl restart victoriametrics
# Increase memory for VictoriaMetrics victoriametrics \ -storageDataPath=/victoriametrics-data \ -memory.allowedPercent=80
# For container, increase memory limit docker update --memory="8g" victoriametrics
# Enable memory optimization flags victoriametrics \ -storageDataPath=/victoriametrics-data \ -memory.allowedPercent=80 \ -fsync=false # Reduces memory overhead ```
Common Cause 5: Ingestion Rate Too High
Incoming data volume exceeds processing capacity.
Error pattern:
``
ingestion rate limited: too many rows per second
Diagnosis:
```bash # Check ingestion rate curl -s http://localhost:8428/metrics | grep vm_rows_ingested_total
# Calculate ingestion rate per second curl -s http://localhost:8428/api/v1/status | jq '.ingestion.rate'
# Check pending writes curl -s http://localhost:8428/metrics | grep vm_pending_writes
# Monitor ingestion during peak times watch -n 5 'curl -s http://localhost:8428/api/v1/status | jq ".ingestion"' ```
Solution:
Limit or optimize ingestion:
```bash # Set ingestion rate limits victoriametrics \ -storageDataPath=/victoriametrics-data \ -maxIngestionRate=1000000 # Rows per second
# Reduce scrape frequency in Prometheus global: scrape_interval: 30s # Increase from 15s
# Drop unnecessary metrics scrape_configs: - job_name: 'apps' metric_relabel_configs: - source_labels: [__name__] regex: 'go_gc_.*|go_memstats_.*' action: drop
# Use downsampling for historical data # Configure vmagent with downsampling vmagent \ -remoteWrite.url=http://victoriametrics:8428/api/v1/write \ -remoteWrite.maxRowsPerBlock=10000
# Increase storage capacity victoriametrics \ -storageDataPath=/victoriametrics-data \ -storage.minFreeDiskSpaceBytes=1GB # Reserve minimum space ```
Common Cause 6: No Automatic Cleanup Running
VictoriaMetrics cleanup process not triggering.
Error pattern: Storage keeps growing despite retention settings.
Diagnosis:
```bash # Check if retention is configured curl -s http://localhost:8428/api/v1/status | jq '.retention'
# Check last cleanup time in logs grep -i "cleanup|retention|delete" /var/log/victoriametrics.log
# Check partition ages ls -la /victoriametrics-data/data/
# Compare partition count to expected retention find /victoriametrics-data/data -type d -name "*-*-*" | wc -l ```
Solution:
Force cleanup and verify retention:
```bash # Verify retention parameter ps aux | grep victoriametrics | grep -o "retentionPeriod=[^ ]*"
# Manually trigger cleanup by restarting systemctl restart victoriametrics
# Check cleanup after restart sleep 60 ls -la /victoriametrics-data/data/
# Force delete old data via API curl -X POST 'http://localhost:8428/api/v1/delete?match[]={__name__=~".+"}&start=-60d&end=-30d'
# Configure explicit retention victoriametrics \ -storageDataPath=/victoriametrics-data \ -retentionPeriod=30d \ -retention.maxHoursPerPartition=24
# Set deletion API for regular cleanup (for cluster version) curl -X POST 'http://localhost:8428/api/v1/admin/config/retention?retention=30d' ```
Common Cause 7: Concurrent Query Pressure
Heavy querying impacts storage operations.
Error pattern:
``
storage operations blocked by query load
Diagnosis:
```bash # Check concurrent queries curl -s http://localhost:8428/metrics | grep vm_concurrent_selects
# Check query duration curl -s http://localhost:8428/metrics | grep vm_select_duration_seconds
# Monitor query load curl -s http://localhost:8428/api/v1/status | jq '.queries'
# Check query queue curl -s http://localhost:8428/metrics | grep vm_select_queue_length ```
Solution:
Optimize query handling:
```bash # Increase query concurrency limit victoriametrics \ -storageDataPath=/victoriametrics-data \ -maxConcurrentSelects=16 \ -select.maxTimeSeriesPerQuery=100000
# Reduce query complexity # Limit query time range curl -s 'http://localhost:8428/api/v1/query_range?query=up&start=-6h&end=now&step=5m'
# Use query caching victoriametrics \ -storageDataPath=/victoriametrics-data \ -search.maxPointsPerSeries=10000 \ -search.maxSeriesPerQuery=5000
# Schedule heavy queries during low-traffic periods ```
Verification
After fixing storage issues:
```bash # Check storage metrics curl -s http://localhost:8428/api/v1/status | jq '.storage'
# Verify disk space df -h /victoriametrics-data
# Check ingestion is working curl -s 'http://localhost:8428/api/v1/query?query=up' | jq '.data.result | length'
# Monitor storage growth watch -n 60 'du -sh /victoriametrics-data && curl -s http://localhost:8428/api/v1/status/active_series | jq ".data"'
# Verify retention is working ls -la /victoriametrics-data/data/ | head -20
# Check for recent data curl -s 'http://localhost:8428/api/v1/query_range?query=up&start=-5m&end=now&step=1m' | jq '.data.result[].values | length' ```
Prevention
Set up storage monitoring:
```yaml groups: - name: victoriametrics_storage rules: - alert: VictoriaMetricsStorageLow expr: vm_storage_free_bytes / vm_storage_capacity_bytes < 0.2 for: 5m labels: severity: warning annotations: summary: "VictoriaMetrics storage space low: {{ $value }}%"
- alert: VictoriaMetricsStorageCritical
- expr: vm_storage_free_bytes / vm_storage_capacity_bytes < 0.1
- for: 2m
- labels:
- severity: critical
- annotations:
- summary: "VictoriaMetrics storage critically low"
- alert: VictoriaMetricsHighCardinality
- expr: vm_active_time_series > 10000000
- for: 10m
- labels:
- severity: warning
- annotations:
- summary: "VictoriaMetrics cardinality exceeding 10M series"
- alert: VictoriaMetricsIngestionRateLimited
- expr: rate(vm_rows_ingested_total[5m]) > 1000000
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "VictoriaMetrics ingestion rate very high"
`
Regular maintenance script:
```bash #!/bin/bash # victoriametrics-maintenance.sh
# Check storage usage STORAGE_USAGE=$(curl -s http://localhost:8428/api/v1/status | jq '.storage.used_bytes') STORAGE_TOTAL=$(curl -s http://localhost:8428/api/v1/status | jq '.storage.capacity_bytes')
echo "Storage: $STORAGE_USAGE / $STORAGE_TOTAL bytes"
# Check active series ACTIVE_SERIES=$(curl -s http://localhost:8428/api/v1/status/active_series | jq '.data') echo "Active series: $ACTIVE_SERIES"
# Alert if storage > 80% if [ $(echo "$STORAGE_USAGE / $STORAGE_TOTAL > 0.8" | bc -l) -eq 1 ]; then echo "WARNING: Storage exceeds 80%" fi ```
VictoriaMetrics storage issues stem from disk exhaustion, retention misconfiguration, or cardinality problems. Check disk space first, then verify retention settings and analyze time series cardinality to identify the root cause.