The Problem
Prometheus logs show sample limit exceeded errors, and metrics are being dropped:
level=warn ts=2026-04-04T06:45:22.123Z caller=scrape.go:1456 component="scrape manager" scrape_pool=kubernetes-pods target=http://10.0.0.5:8080/metrics msg="Scrape failed" err="sample_limit exceeded (10000 > 5000)"You might also see in the UI:
Error: sample_limit (5000) exceededThis occurs when a single scrape returns more samples than the configured limit, protecting Prometheus from cardinality explosions.
Diagnosis
Check Current Sample Counts
```promql # Samples scraped per target scrape_samples_scraped
# Targets exceeding sample limits scrape_samples_scraped > 5000
# Top targets by sample count topk(10, scrape_samples_scraped)
# Samples per job sum by (job) (scrape_samples_scraped) ```
Identify High Cardinality Metrics
```promql # Count series per metric name count by (__name__)({__name__=~".+"})
# Top 20 highest cardinality metrics topk(20, count by (__name__)({__name__=~".+"}))
# Cardinality growth rate delta(count by (__name__)({__name__=~".+"})[1h]) ```
Find Problem Labels
```promql # Find labels with many values count_values by (label_name) ("cardinality", {__name__=~".+"})
# Top label cardinality topk(10, count by (pod, container) ({__name__=~".+"})) ```
Solutions
1. Increase Sample Limit
Quick fix for legitimate high-cardinality targets:
# prometheus.yml
scrape_configs:
- job_name: 'high-cardinality-app'
sample_limit: 50000 # increase from default (0 = unlimited, not recommended)
static_configs:
- targets: ['app:9090']Apply and reload:
```bash # Reload config curl -X POST http://localhost:9090/-/reload
# Or restart systemctl restart prometheus ```
2. Reduce Metric Cardinality
Drop unnecessary labels at scrape time:
```yaml scrape_configs: - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod metric_relabel_configs: # Drop high-cardinality labels - action: labeldrop regex: '(pod_template_hash|deployment_kubernetes_io|pod_template_generation)'
# Keep only needed labels - action: keep source_labels: [__name__] regex: '(http_requests_total|http_request_duration_seconds|process_.+)'
# Replace high-cardinality values - source_labels: [__meta_kubernetes_pod_label_version] target_label: version action: replace ```
3. Drop Unwanted Metrics
Exclude metrics you don't need:
```yaml scrape_configs: - job_name: 'myapp' metric_relabel_configs: # Drop all metrics starting with unwanted prefix - action: drop source_labels: [__name__] regex: 'unwanted_metric_.+'
# Drop specific high-cardinality metrics - action: drop source_labels: [__name__] regex: '(http_request_duration_seconds_bucket|grpc_server_handled_total)' ```
4. Aggregate High Cardinality Metrics
Use recording rules to pre-aggregate:
```yaml # recording_rules.yml groups: - name: cardinality_reduction interval: 30s rules: # Aggregate away high-cardinality label - record: http_requests_total:by_method expr: sum without (pod, container, endpoint) (http_requests_total)
# Bucket aggregation - record: http_request_duration_seconds:bucket:by_service expr: sum without (pod, instance) (http_request_duration_seconds_bucket) ```
5. Configure Label Limits
Limit labels per sample:
scrape_configs:
- job_name: 'myapp'
label_limit: 20
label_name_length_limit: 64
label_value_length_limit: 128
static_configs:
- targets: ['app:9090']6. Use Histogram Buckets Wisely
Reduce histogram cardinality:
# In your application's metrics configuration
histogramBuckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
# Instead of:
# histogramBuckets: [0.001, 0.002, 0.003, ... many buckets]Verification
Monitor sample counts after changes:
```promql # Should be below limit scrape_samples_scraped{job="myapp"} < 50000
# Check dropped samples rate(prometheus_target_scrapes_exceeded_sample_limit_total[5m])
# Verify no limit errors in logs # journalctl -u prometheus --since "1 hour ago" | grep "sample_limit" ```
Prevention
Add alerts for cardinality issues:
```yaml groups: - name: cardinality_alerts rules: - alert: HighSampleCount expr: scrape_samples_scraped > 20000 for: 5m labels: severity: warning annotations: summary: "Target {{ $labels.instance }} exposing many samples" description: "Target is exposing {{ $value }} samples, consider reducing cardinality"
- alert: SampleLimitApproaching
- expr: scrape_samples_scraped / 50000 > 0.8
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "Sample limit approaching for {{ $labels.instance }}"
- alert: HighCardinalityMetric
- expr: count by (__name__)({__name__=~".+"}) > 10000
- for: 15m
- labels:
- severity: warning
- annotations:
- summary: "Metric {{ $labels.__name__ }} has high cardinality"
`
Monitor cardinality growth:
```promql # Cardinality growth rate delta(count by (__name__)({__name__=~".+"})[1h]) > 1000
# Total series count sum(scrape_samples_scraped) ```