Fix Prometheus Memory Limit Exceeded

The Problem

You're seeing Prometheus crash with memory-related errors, or the OOM killer is terminating the process. The logs might show:

bash

level=error ts=2026-04-04T09:15:32.789Z caller=db.go:892 msg="out of memory"
level=fatal ts=2026-04-04T09:15:32.790Z caller=main.go:345 err="runtime error: out of memory"

Or from the kernel:

bash

Apr  4 09:15:32 monitoring kernel: Out of memory: Kill process 1842 (prometheus) score 890 or sacrifice child
Apr  4 09:15:32 monitoring kernel: Killed process 1842 (prometheus) total-vm:8388608kB, anon-rss:7864320kB

This happens when Prometheus consumes more memory than available, typically during high-cardinality queries, excessive series churn, or insufficient head block configuration.

Diagnosis

Check Current Memory Usage

```promql # Process memory usage process_resident_memory_bytes{job="prometheus"}

# Go memory stats go_memstats_heap_inuse_bytes{job="prometheus"} go_memstats_heap_alloc_bytes{job="prometheus"}

# Memory limit if set prometheus_config_memory_limit_bytes ```

Identify High Cardinality Series

```promql # Top 10 metrics by series count topk(10, count by (__name__)({__name__=~".+"}))

# Series with highest label combinations topk(10, count by (job, instance)({__name__=~".+"})) ```

Check Head Block Memory

```promql # Head series count prometheus_tsdb_head_series{job="prometheus"}

# Head chunks count prometheus_tsdb_head_chunks{job="prometheus"}

# Head memory usage prometheus_tsdb_head_mem_chunk_count{job="prometheus"} ```

Solutions

1. Increase Memory Limit

Edit your Prometheus startup configuration:

yaml

# prometheus.yml - not directly related but ensure minimal config
global:
  scrape_interval: 15s
  evaluation_interval: 15s

Update the systemd service or Docker configuration:

```bash # Systemd - /etc/systemd/system/prometheus.service [Service] MemoryMax=8G MemoryHigh=7G

# Docker docker run -d \ --name prometheus \ --memory="8g" \ --memory-swap="8g" \ prom/prometheus:latest ```

2. Reduce Head Block Retention

Lower the time series stored in memory:

bash

prometheus \
  --storage.tsdb.head-fullness.percentage.max=75 \
  --storage.tsdb.retention.time=15d \
  --storage.tsdb.retention.size=50GB

3. Limit Series and Samples

Enforce hard limits to prevent runaway cardinality:

bash

prometheus \
  --storage.tsdb.max-block-duration=2h \
  --storage.tsdb.wal-segment-size=50MB \
  --query.max-samples=50000000

Add sample limits in configuration:

```yaml # prometheus.yml global: scrape_interval: 15s scrape_timeout: 10s

# Limit samples per scrape scrape_configs: - job_name: 'kubernetes-pods' sample_limit: 5000 label_limit: 30 label_name_length_limit: 200 label_value_length_limit: 200 ```

4. Optimize Recording Rules

Replace expensive queries with recording rules:

```yaml # recording_rules.yml groups: - name: memory_optimization interval: 30s rules: - record: job:http_requests:rate5m expr: sum by (job) (rate(http_requests_total[5m]))

record: instance:memory_usage:percentage
expr: 100 * (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)
`

5. Enable Memory Ballast

For Go-based applications, pre-allocate memory:

bash

# This is handled internally by Prometheus v2.40+
# But you can set GOGC to control garbage collection
export GOGC=50
prometheus --config.file=prometheus.yml

Verification

After applying changes, verify memory stability:

```promql # Memory should stay below 80% of limit process_resident_memory_bytes / prometheus_config_memory_limit_bytes < 0.8

# Head series should be stable delta(prometheus_tsdb_head_series[1h]) < 10000

# No OOM events in logs # Check with: # journalctl -u prometheus --since "1 hour ago" | grep -i "out of memory" ```

Prevention

Set up alerting for memory pressure:

```yaml # alert_rules.yml groups: - name: prometheus_memory rules: - alert: PrometheusMemoryHigh expr: process_resident_memory_bytes{job="prometheus"} > 6 * 1024 * 1024 * 1024 for: 5m labels: severity: warning annotations: summary: "Prometheus memory usage high" description: "Memory usage is {{ $value | humanizeBytes }}"

alert: PrometheusMemoryCritical
expr: process_resident_memory_bytes{job="prometheus"} > 7 * 1024 * 1024 * 1024
for: 2m
labels:
severity: critical
annotations:
summary: "Prometheus approaching memory limit"
`

Regular monitoring of cardinality growth:

promql

# Alert on cardinality growth
delta(prometheus_tsdb_head_series[1h]) > 50000

The Problem

Diagnosis

Check Current Memory Usage

Identify High Cardinality Series

Check Head Block Memory

Solutions

1. Increase Memory Limit

2. Reduce Head Block Retention

3. Limit Series and Samples

4. Optimize Recording Rules

5. Enable Memory Ballast

Verification

Prevention

Share this guide

More Monitoring Troubleshooting Guides

Metric Retention Expired

Timeseries Storage Full

Collector Agent Crashed

Webhook Notification Timeout

SMS Notification Failed

Email Notification Bounced