Fix Prometheus Cardinality Explosion From Unbounded Label Values

Introduction

Prometheus stores one time series per unique combination of metric name and label values. When a metric includes a label with unbounded values -- such as user IDs, request IDs, IP addresses, or URLs -- each unique value creates a new time series. This cardinality explosion quickly exhausts Prometheus memory and disk, causing scrape failures and out-of-memory crashes.

Symptoms

Prometheus memory usage grows rapidly, eventually triggering OOM killer
prometheus_tsdb_head_series metric shows exponential time series growth
Scrape duration increases as the number of time series overwhelms the TSDB
Query performance degrades, with simple queries taking tens of seconds
Error message: mmap: Cannot allocate memory or TSDB head chunk mmap: no space left on device

Common Causes

Metric exposing user_id, request_id, or client_ip as a label value
Application generating unique labels per HTTP request path with query parameters
Exporter not filtering high-cardinality labels from upstream metrics
Label values containing timestamps or UUIDs, creating a new series per event
Missing label allowlist/denylist on exporters that forward all labels

Step-by-Step Fix

1.Identify the highest cardinality metrics: Find which metrics are creating the most series.
2.```bash
3.curl -s http://localhost:9090/api/v1/status/tsdb | jq '.seriesCountByMetricName[:10]'
4.`
5.Check which labels are causing the explosion: Examine label cardinality.
6.```bash
7.curl -s http://localhost:9090/api/v1/status/tsdb | jq '.labelValueCountByLabelName | to_entries | sort_by(.value) | reverse | .[:10]'
8.`
9.Drop the problematic label using metric relabeling: Remove the high-cardinality label at scrape time.
10.```yaml
11.metric_relabel_configs:
12.- source_labels: [__name__]
13.regex: "http_request_duration.*"
14.action: drop
15.# Or drop just the problematic label
16.- regex: "user_id|request_id|client_ip"
17.action: labeldrop
18.`
19.Fix the application instrumentation: Remove the high-cardinality label at the source.
20.```java
21.// WRONG: high cardinality
22.Counter.builder("http_requests_total")
23..tag("user_id", userId)
24..register(registry);

// CORRECT: use bounded labels Counter.builder("http_requests_total") .tag("method", method) .tag("status", status) .register(registry); ```

1.Set series limits to prevent future explosions: Cap the number of time series.
2.```bash
3.# Prometheus does not have a built-in series limit, but you can use:
4.# - Recording rules to aggregate before series count grows
5.# - Remote write with series limits (Cortex/Thanos)
6.# - Alert on series count growth rate
7.- alert: HighCardinalityGrowth
8.expr: rate(prometheus_tsdb_head_series[1h]) > 100
9.`

Prevention

Define and enforce a label cardinality policy: only use labels with bounded, enumerated values
Use labeldrop in metric_relabel_configs to strip known high-cardinality labels at scrape time
Monitor prometheus_tsdb_head_series growth rate and alert on sudden increases
Review all new metric instrumentation in code reviews for cardinality risk
Use histograms with explicit bucket boundaries instead of per-request latency labels
Implement automated cardinality testing in CI that flags metrics with unbounded label values

Prometheus Cardinality Explosion From Unbounded Label Values

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Prevention

Share this guide

More Prometheus Troubleshooting Guides

Prometheus Retention Period Config Ignored Disk Still Filling

Prometheus Service Discovery Kubernetes API Rate Limited

Prometheus WAL Corruption After Unclean Shutdown Requiring Repair

Prometheus Relabel Config Dropping All Metrics Accidentally

Prometheus Federation Upstream Timeout on Slow Remote Read

Prometheus Alertmanager Notification Webhook Delivery Failed