Introduction
Prometheus high cardinality errors occur when the number of unique time series explodes beyond manageable limits, causing memory exhaustion, slow queries, compaction failures, and potential Prometheus server crashes. Cardinality in Prometheus refers to the number of unique label combinations (time series) for a metric. High cardinality typically manifests as "out of bounds" errors, "sample limit exceeded" warnings, compaction taking too long, TSDB head chunks persist failures, remote write backlog growth, and OOM kills. Common causes include using high-cardinality labels (user IDs, email addresses, IP addresses, request IDs, pod names with unique identifiers), unbounded label values (timestamps, random strings, UUIDs), joining metrics with many-to-one relationships that explode series, service discovery creating ephemeral labels, container ID or instance ID as labels, query parameters as labels, and client libraries creating dynamic labels from request attributes. The fix requires identifying high-cardinality metrics, implementing cardinality limits, relabeling to drop expensive labels, aggregating metrics at source, and configuring appropriate resource limits. This guide provides production-proven strategies for managing Prometheus cardinality across single-server and federated deployments.
Symptoms
- Prometheus logs show "cardinality limit exceeded" or "series limit exceeded"
- Memory usage grows continuously until OOM kill
- TSDB compaction takes excessively long (>30 minutes)
- Remote write backlog grows unbounded
- Queries timeout or return "query execution cancelled"
prometheus_tsdb_head_seriesmetric shows rapid growthprometheus_tsdb_storage_blocks_bytesgrows faster than expected- Scrape targets show "context deadline exceeded"
prometheus_target_scrapes_exceeded_sample_limitincreases- Prometheus becomes unresponsive during compaction
rate()andhistogram_quantile()queries extremely slow
Common Causes
- High-cardinality labels (user_id, email, ip_address, session_id)
- Unbounded label values (UUIDs, timestamps, random strings)
- Container ID or pod UID as label without aggregation
- Request ID or trace ID as metric label
- Query parameters exported as labels
- Joining metrics causing series explosion (many-to-many)
- Service discovery adding dynamic labels per scrape
- Client library auto-instrumentation capturing request details
- Histogram buckets too granular for label combinations
- Summary quantiles with many label combinations
- Metric relabeling creating new high-cardinality labels
- Federation aggregating without label deduplication
Step-by-Step Fix
### 1. Diagnose high cardinality metrics
Check current cardinality:
```bash # Connect to Prometheus API PROMETHEUS_URL="http://localhost:9090"
# Get top 10 highest cardinality metrics curl -s "$PROMETHEUS_URL/api/v1/api/v1/status/tsdb" | jq ' .data.topLabelValuesCombinations | map({labelValues: .labelValues, cardinality: .cardinality}) | sort_by(-.cardinality) | .[0:10]'
# Or use promtool for local analysis promtool tsdb analyze /path/to/prometheus/data
# Check series count per metric curl -s "$PROMETHEUS_URL/api/v1/label/__name__/values" | jq -r '.data[]' | \ while read metric; do count=$(curl -s "$PROMETHEUS_URL/api/v1/series?match[]=$metric" | jq '.data | length') echo "$count $metric" done | sort -rn | head -20
# Query series count directly (expensive on large datasets) # Use in Prometheus UI or with thanos query count by (__name__) ({__name__=~".+"}) ```
Check memory usage:
```bash # Prometheus memory metrics curl -s "$PROMETHEUS_URL/api/v1/query?query=prometheus_tsdb_head_series" | jq curl -s "$PROMETHEUS_URL/api/v1/query?query=prometheus_tsdb_head_chunks" | jq curl -s "$PROMETHEUS_URL/api/v1/query?query=prometheus_tsdb_head_chunks_storage_size_bytes" | jq curl -s "$PROMETHEUS_URL/api/v1/query?query=go_memstats_alloc_bytes{job=\"prometheus\"}" | jq
# Memory per metric (approximate) # Formula: series_count * chunk_size * chunks_per_series # Typical: 100MB per 1M series with default settings
# Check Prometheus process memory ps aux | grep prometheus | awk '{print $6/1024, $11}'
# Or with systemd systemctl status prometheus | grep Memory ```
Identify problematic labels:
```bash # Find labels with many unique values # Run in Prometheus UI or via API
# Top labels by cardinality label_values(label_name)
# Check specific metric labels label_values(http_requests_total,method) label_values(http_requests_total,status) label_values(http_requests_total,endpoint)
# Find high-cardinality label values # This query shows label value counts for http_requests_total count by (label_name) ( label_join( {__name__="http_requests_total"}, "label_name", "", # List your labels here ) )
# Or use promql to find series with many labels count_values("label_count", label_join(http_requests_total, "dummy", "")) ```
Use promtool for TSDB analysis:
```bash # Analyze TSDB blocks promtool tsdb analyze /var/lib/prometheus/data --block-id <block_id>
# Output shows: # - Series count per block # - Samples count per series # - Label value cardinality # - Chunk statistics
# Analyze specific time range promtool tsdb analyze /var/lib/prometheus/data \ --min-time $(date -d "1 hour ago" -u +%Y-%m-%dT%H:%M:%SZ) \ --max-time $(date -u +%Y-%m-%dT%H:%M:%SZ)
# Check for label explosion patterns promtool tsdb analyze /var/lib/prometheus/data | grep -A 20 "Label Value Cardinality" ```
### 2. Configure cardinality limits
Set series limits:
```yaml # prometheus.yml - Global limits global: # Maximum samples per scrape scrape_sample_limit: 10000
# Maximum series per target (Prometheus 2.45+) target_limit: 5000
# Per-job limits scrape_configs: - job_name: 'api-service' # Limit samples per scrape for this job sample_limit: 5000
# Limit label value length label_limit: 128
# Limit number of labels per series label_name_length_limit: 64
# Limit label value length label_value_length_limit: 256 ```
Configure TSDB limits:
```yaml # Command-line flags for prometheus server --storage.tsdb.max-block-duration=2h # Smaller blocks = faster compaction --storage.tsdb.min-block-duration=2h --storage.tsdb.retention.time=15d # Reduce retention if needed --storage.tsdb.retention.size=50GB # Size-based retention --storage.tsdb.head-chunks-write-queue-size=10000 --storage.tsdb.out-of-order-time-window=5m # Reduce if causing issues
# Memory limits --storage.tsdb.head-chunks-max-chunk-segment-size=512MB --max-block-chunk-segment-size=256MB
# Series limit (Prometheus 2.37+) --storage.tsdb.max-block-chunk-segment-size=256MB
# For Thanos/Sidecar deployments # thanos-sidecar --tsdb.path /var/lib/prometheus/data ```
Configure scrape timeout and interval:
```yaml scrape_configs: - job_name: 'kubernetes-pods' # Reduce scrape frequency for high-cardinality metrics scrape_interval: 60s
# Timeout should be less than interval scrape_timeout: 30s
# Limit concurrent scrapes scrape_config_files: - /etc/prometheus/jobs/*.yml ```
### 3. Implement metric relabeling
Drop high-cardinality labels:
```yaml scrape_configs: - job_name: 'api-service' static_configs: - targets: ['api:8080']
# Drop specific labels that cause cardinality explosion metric_relabel_configs: # Drop user_id label (common culprit) - source_labels: [__name__, user_id] regex: (.+) action: labeldrop # This drops the user_id label from all metrics
# Or drop labels matching pattern - source_labels: [__name__, request_id] regex: (.+) action: labeldrop
# Drop labels for specific metrics only - source_labels: [__name__, endpoint] regex: (http_requests_total|http_response_time).* action: drop # Drops entire metric if it has endpoint label
# Keep only specific labels - source_labels: [__name__, method, status, le] regex: (.+) action: keep # Drops all labels except method, status, le
# Replace high-cardinality label with aggregated value - source_labels: [instance] regex: '(.+)-[a-f0-9-]+(.*)' # Match pod-uuid pattern target_label: instance replacement: '${1}${2}' # Removes UUID from instance name ```
Aggregate labels at scrape time:
```yaml scrape_configs: - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod
metric_relabel_configs: # Aggregate pod names (remove random suffix) - source_labels: [pod] regex: '(.+)-[a-z0-9]+(-[a-z0-9]+)?' target_label: pod replacement: '${1}'
# Aggregate container IDs - source_labels: [container_id] regex: '(.{12}).*' target_label: container_id replacement: '${1}' # Keep only first 12 characters
# Drop ephemeral labels - regex: 'revision|pod_template_hash|controller_revision_hash' action: labeldrop
# Normalize endpoint paths - source_labels: [endpoint] regex: '/api/v1/users/[^/]+' target_label: endpoint replacement: '/api/v1/users/{id}' # Replace /api/v1/users/123 with /api/v1/users/{id} ```
Drop entire metrics:
```yaml scrape_configs: - job_name: 'api-service'
metric_relabel_configs: # Drop verbose metrics - source_labels: [__name__] regex: 'go_.*' action: drop # Drop all Go runtime metrics if not needed
# Drop histogram buckets you don't use - source_labels: [__name__, le] regex: 'http_request_duration_seconds_bucket;(0\\.001|0\\.005)' action: drop # Drop sub-10ms buckets if not analyzed
# Drop summary quantiles - source_labels: [__name__, quantile] regex: '(.*)_summary;(0\\.99|0\\.999)' action: drop ```
### 4. Fix application-level cardinality
Instrument code correctly:
```go // BAD: High cardinality - user_id as label var httpRequests = prometheus.NewCounterVec( prometheus.CounterOpts{ Name: "http_requests_total", Help: "Total HTTP requests", }, []string{"method", "status", "user_id"}, // user_id is unbounded! )
// GOOD: Fixed cardinality - only bounded labels var httpRequests = prometheus.NewCounterVec( prometheus.CounterOpts{ Name: "http_requests_total", Help: "Total HTTP requests", }, []string{"method", "status", "endpoint"}, // endpoint is bounded )
// For user-specific tracking, use separate approach // - Log to structured logging // - Use external analytics system // - Sample and export aggregate metrics
// BAD: Request ID as label labels["request_id"] = request.ID // UUID - millions of values
// GOOD: Use request ID in logs, not metrics log.WithField("request_id", request.ID).Info("request completed")
// For tracing, use distributed tracing (Jaeger, Zipkin) // not Prometheus metrics ```
Python application fixes:
```python from prometheus_client import Counter, Histogram
# BAD: Dynamic labels from request request_counter = Counter( 'http_requests_total', 'HTTP Requests', ['method', 'status', 'user_agent', 'ip_address'] # High cardinality! )
def handle_request(request): # This creates a new series per user agent and IP request_counter.labels( method=request.method, status=response.status, user_agent=request.headers.get('User-Agent', 'unknown'), ip_address=request.remote_addr ).inc()
# GOOD: Bounded labels only request_counter = Counter( 'http_requests_total', 'HTTP Requests', ['method', 'status', 'service'] )
def handle_request(request): request_counter.labels( method=request.method, status=response.status, service='api' ).inc()
# Log high-cardinality data separately logging.info(f"Request from {request.remote_addr} with {request.headers.get('User-Agent')}") ```
Fix histogram bucket explosion:
```go // BAD: Too many buckets combined with many labels var requestDuration = prometheus.NewHistogramVec( prometheus.HistogramOpts{ Name: "http_request_duration_seconds", Help: "HTTP request duration", Buckets: prometheus.DefBuckets, // 14 buckets // Default: [0.0005, 0.001, 0.0025, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10] }, []string{"method", "endpoint", "status", "service", "team"}, // 5 labels ) // Cardinality: 14 buckets * 10 methods * 100 endpoints * 10 status * 5 services * 5 teams // = 14 * 10 * 100 * 10 * 5 * 5 = 3.5 MILLION series
// GOOD: Fewer buckets, fewer labels var requestDuration = prometheus.NewHistogramVec( prometheus.HistogramOpts{ Name: "http_request_duration_seconds", Help: "HTTP request duration", Buckets: []float64{0.1, 0.5, 1, 2.5, 5, 10}, // 6 buckets }, []string{"method", "status"}, // Only essential labels ) // Cardinality: 6 buckets * 10 methods * 10 status = 600 series ```
### 5. Configure remote write and federation
Remote write cardinality control:
```yaml remote_write: - url: "https://cortex.example.com/api/v1/push"
# Queue configuration queue_config: capacity: 10000 max_shards: 50 max_samples_per_send: 5000 batch_send_deadline: 5s max_retries: 3 min_backoff: 100ms max_backoff: 5s
# Write relabeling to reduce cardinality before sending write_relabel_configs: # Drop high-cardinality metrics - source_labels: [__name__] regex: 'expensive_metric_.*' action: drop
# Keep only specific metrics - source_labels: [__name__] regex: '(http_requests_total|node_.*|container_*)' action: keep
# Drop specific labels - regex: 'pod_template_hash|controller_revision_hash' action: labeldrop
# Aggregate instance labels - source_labels: [instance] regex: '(.+)-[a-f0-9-]+' target_label: instance replacement: '${1}' ```
Federation configuration:
```yaml # Federating Prometheus - scrape from multiple sources scrape_configs: - job_name: 'federate' honor_labels: true # Preserve labels from source metrics_path: '/federate' params: 'match[]': - '{job="api-service"}' - '{__name__=~"node_.*"}' static_configs: - targets: - 'prometheus-1:9090' - 'prometheus-2:9090'
# Relabel to prevent cardinality explosion metric_relabel_configs: # Aggregate source Prometheus instances - source_labels: [instance, prometheus_instance] regex: '(.+);(.+)' target_label: instance replacement: '${2}' ```
### 6. Monitor cardinality
Set up cardinality monitoring:
```yaml # Alert rules for cardinality monitoring # /etc/prometheus/rules/cardinality-alerts.yml
groups: - name: prometheus-cardinality rules: - alert: PrometheusHighCardinality expr: prometheus_tsdb_head_series > 1000000 for: 5m labels: severity: warning annotations: summary: "Prometheus series count high" description: "{{ $value }} series in TSDB head"
- alert: PrometheusVeryHighCardinality
- expr: prometheus_tsdb_head_series > 5000000
- for: 5m
- labels:
- severity: critical
- annotations:
- summary: "Prometheus series count critical"
- description: "{{ $value }} series - investigate immediately"
- alert: PrometheusMetricHighCardinality
- expr: |
- count by (__name__) ({__name__=~".+"}) > 100000
- for: 10m
- labels:
- severity: warning
- annotations:
- summary: "High cardinality metric {{ $labels.__name__ }}"
- description: "Metric has {{ $value }} series"
- alert: PrometheusScrapeSampleLimitExceeded
- expr: |
- rate(prometheus_target_scrapes_exceeded_sample_limit_total[5m]) > 0
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "Scrape sample limit exceeded"
- description: "Target {{ $labels.job }} exceeding sample limit"
- alert: PrometheusRemoteWriteBacklog
- expr: |
- prometheus_remote_storage_highest_timestamp_in_seconds -
- prometheus_remote_storage_queue_highest_sent_timestamp_seconds > 300
- for: 5m
- labels:
- severity: critical
- annotations:
- summary: "Remote write backlog growing"
- description: "Backlog of {{ $value }} seconds"
- alert: PrometheusMemoryHigh
- expr: |
- process_resident_memory_bytes{job="prometheus"} > 8 * 1024 * 1024 * 1024
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "Prometheus memory usage high"
- description: "Using {{ $value | humanize1024 }}B of memory"
`
Cardinality dashboard queries:
```promql # Series count over time prometheus_tsdb_head_series
# Samples ingested per second rate(prometheus_tsdb_head_samples_appended_total[5m])
# Memory usage process_resident_memory_bytes{job="prometheus"}
# Compaction duration prometheus_tsdb_compaction_duration_seconds
# Top 10 metrics by series count topk(10, count by (__name__) ({__name__=~".+"}))
# Label value cardinality for specific metric count by (label_name) (label_join({__name__="http_requests_total"}, "label_name", ""))
# Series growth rate deriv(prometheus_tsdb_head_series[1h]) ```
Prevention
- Design metrics with bounded labels only (no user IDs, emails, IPs)
- Implement cardinality limits in CI/CD before deploying new metrics
- Use metric relabeling to drop unnecessary labels at scrape time
- Aggregate high-cardinality data in application before export
- Monitor series count and set up alerting at 50%, 75%, 90% of limit
- Document cardinality budget per service (e.g., 1000 series per service)
- Use exemplars for tracing instead of labels for request IDs
- Implement metric naming and labeling standards
- Review new metrics in code review with cardinality checklist
- Use Prometheus cardinality profiler in staging environment
Related Errors
- **Prometheus TSDB compaction failed**: Block compaction taking too long
- **Prometheus remote write backlog**: Remote storage cannot keep up
- **Prometheus OOM killed**: Memory exhausted from high cardinality
- **Prometheus scrape timeout**: Too many samples per scrape
- **Prometheus query timeout**: Queries too slow due to series count