Introduction
Prometheus stores one time series per unique combination of metric name and label values. When a metric includes a label with unbounded values -- such as user IDs, request IDs, IP addresses, or URLs -- each unique value creates a new time series. This cardinality explosion quickly exhausts Prometheus memory and disk, causing scrape failures and out-of-memory crashes.
Symptoms
- Prometheus memory usage grows rapidly, eventually triggering OOM killer
prometheus_tsdb_head_seriesmetric shows exponential time series growth- Scrape duration increases as the number of time series overwhelms the TSDB
- Query performance degrades, with simple queries taking tens of seconds
- Error message:
mmap: Cannot allocate memoryorTSDB head chunk mmap: no space left on device
Common Causes
- Metric exposing
user_id,request_id, orclient_ipas a label value - Application generating unique labels per HTTP request path with query parameters
- Exporter not filtering high-cardinality labels from upstream metrics
- Label values containing timestamps or UUIDs, creating a new series per event
- Missing label allowlist/denylist on exporters that forward all labels
Step-by-Step Fix
- 1.Identify the highest cardinality metrics: Find which metrics are creating the most series.
- 2.```bash
- 3.curl -s http://localhost:9090/api/v1/status/tsdb | jq '.seriesCountByMetricName[:10]'
- 4.
` - 5.Check which labels are causing the explosion: Examine label cardinality.
- 6.```bash
- 7.curl -s http://localhost:9090/api/v1/status/tsdb | jq '.labelValueCountByLabelName | to_entries | sort_by(.value) | reverse | .[:10]'
- 8.
` - 9.Drop the problematic label using metric relabeling: Remove the high-cardinality label at scrape time.
- 10.```yaml
- 11.metric_relabel_configs:
- 12.- source_labels: [__name__]
- 13.regex: "http_request_duration.*"
- 14.action: drop
- 15.# Or drop just the problematic label
- 16.- regex: "user_id|request_id|client_ip"
- 17.action: labeldrop
- 18.
` - 19.Fix the application instrumentation: Remove the high-cardinality label at the source.
- 20.```java
- 21.// WRONG: high cardinality
- 22.Counter.builder("http_requests_total")
- 23..tag("user_id", userId)
- 24..register(registry);
// CORRECT: use bounded labels Counter.builder("http_requests_total") .tag("method", method) .tag("status", status) .register(registry); ```
- 1.Set series limits to prevent future explosions: Cap the number of time series.
- 2.```bash
- 3.# Prometheus does not have a built-in series limit, but you can use:
- 4.# - Recording rules to aggregate before series count grows
- 5.# - Remote write with series limits (Cortex/Thanos)
- 6.# - Alert on series count growth rate
- 7.- alert: HighCardinalityGrowth
- 8.expr: rate(prometheus_tsdb_head_series[1h]) > 100
- 9.
`
Prevention
- Define and enforce a label cardinality policy: only use labels with bounded, enumerated values
- Use
labeldropinmetric_relabel_configsto strip known high-cardinality labels at scrape time - Monitor
prometheus_tsdb_head_seriesgrowth rate and alert on sudden increases - Review all new metric instrumentation in code reviews for cardinality risk
- Use histograms with explicit bucket boundaries instead of per-request latency labels
- Implement automated cardinality testing in CI that flags metrics with unbounded label values