What's Actually Happening
Grafana Tempo rejects traces when ingestion rate exceeds configured limits. Applications receive rate limit errors when sending trace data.
The Error You'll See
Rate limit error:
```bash $ curl -X POST http://tempo:14268/api/traces -d @trace.json
HTTP/1.1 429 Too Many Requests Content-Type: application/json
{ "error": "ingestion rate limit exceeded", "tenant": "anonymous" } ```
OTLP rejection:
ERROR: trace data rejected: rate limit exceeded for tenantTempo logs:
```bash $ journalctl -u tempo | grep rate
WARN ingestion rate limit exceeded for tenant anonymous, rejecting traces level=warn msg="pusher failed to consume trace data" err="rate limit exceeded" ```
Why This Happens
- 1.Trace volume spike - Sudden increase in trace generation
- 2.Limit too low - Default limits insufficient for load
- 3.Missing tenant - Anonymous tenant uses default limits
- 4.Ingester overload - Not enough ingester capacity
- 5.Large traces - Individual traces too big
- 6.High cardinality - Too many unique trace IDs
Step 1: Check Tempo Status
```bash # Check Tempo running: systemctl status tempo
# Check Tempo metrics: curl http://localhost:3200/metrics | grep -E "tempo_(ingester|distributor)"
# Check current rate limits: curl http://localhost:3200/config | jq '.limits'
# Check ingester ring: curl http://localhost:3200/ingester/ring
# Check distributor ring: curl http://localhost:3200/distributor/ring
# Check rejection rate: curl http://localhost:3200/metrics | grep tempo_distributor_spans_rejected ```
Step 2: Check Rate Limit Configuration
```bash # Check current config: cat /etc/tempo/tempo.yaml | grep -A 20 limits
# Default limits: limits: ingestion_rate_strategy: global ingestion_rate_limit_bytes: 15MB # Per tenant ingestion_burst_size_bytes: 20MB max_bytes_per_trace: 5MB max_traces_per_user: 10000 max_spans_per_user: 0 # Unlimited
# Per-tenant overrides: overrides: tenant1: ingestion_rate_limit_bytes: 50MB max_bytes_per_trace: 10MB ```
Step 3: Increase Rate Limits
```yaml # In tempo.yaml: limits: # Global rate limit ingestion_rate_strategy: global ingestion_rate_limit_bytes: 50MB # Increase from 15MB ingestion_burst_size_bytes: 75MB # Burst allowance
# Per trace limits max_bytes_per_trace: 10MB # Increase from 5MB max_traces_per_user: 50000 max_spans_per_user: 0 # Unlimited spans
# Search limits max_search_duration: 4h max_search_bytes_per_trace: 0
# Per-tenant override: overrides: my-tenant: ingestion_rate_limit_bytes: 100MB ingestion_burst_size_bytes: 150MB max_bytes_per_trace: 20MB
# Restart Tempo: systemctl restart tempo
# Or for Kubernetes: kubectl rollout restart deployment/tempo ```
Step 4: Check Trace Volume
```bash # Check trace ingestion rate: curl http://localhost:3200/metrics | grep tempo_distributor_spans_received_total
# Check bytes received: curl http://localhost:3200/metrics | grep tempo_distributor_bytes_received_total
# Calculate rate: curl http://localhost:3200/metrics | grep tempo_distributor_bytes_received_total | awk '{print $2}' sleep 60 curl http://localhost:3200/metrics | grep tempo_distributor_bytes_received_total | awk '{print $2}'
# Check active tenants: curl http://localhost:3200/metrics | grep tempo_ingester_active_tenants
# Check traces per tenant: curl http://localhost:3200/metrics | grep tempo_ingester_traces_created_total
# Identify high-volume tenants: curl http://localhost:3200/distributor/ring | jq ```
Step 5: Configure Multi-Tenancy
```yaml # Enable multi-tenancy for proper rate limiting:
# In tempo.yaml: server: http_listen_port: 3200
distributor: receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318
# Auth configuration: auth_enabled: true
# Tenant limits: overrides: tenant-a: ingestion_rate_limit_bytes: 50MB max_bytes_per_trace: 10MB tenant-b: ingestion_rate_limit_bytes: 100MB max_bytes_per_trace: 20MB
# Send tenant header: # X-Scope-OrgID: tenant-a
# With OTLP: curl -X POST http://tempo:4318/v1/traces \ -H "X-Scope-OrgID: tenant-a" \ -H "Content-Type: application/json" \ -d @trace.json ```
Step 6: Optimize Trace Size
```bash # Check trace sizes: curl http://localhost:3200/metrics | grep tempo_distributor_bytes_per_trace
# Large traces increase rate: # Reduce trace size in application:
# 1. Limit attributes: # Don't include high-cardinality data as attributes # Use structured logging instead
# 2. Batch spans: # OTLP batch processor: processors: batch: timeout: 1s send_batch_size: 1024 send_batch_max_size: 2048
# 3. Sampling: # Use tail-based or head-based sampling:
# Probabilistic sampling: receivers: otlp: protocols: grpc:
processors: probabilistic_sampler: hash_seed: 22 sampling_percentage: 10 # Sample 10% of traces
# 4. Filter spans: processors: filter: spans: exclude: match_type: strict span_names: - healthcheck - metrics ```
Step 7: Scale Ingester
```yaml # For Tempo microservices mode:
# Scale ingesters: apiVersion: apps/v1 kind: Deployment metadata: name: tempo-ingester spec: replicas: 3 # Increase from 1 template: spec: containers: - name: tempo args: - -target=ingester resources: limits: memory: 16Gi cpu: 4 requests: memory: 8Gi cpu: 2
# Check ring: curl http://tempo:3200/ingester/ring
# Minimum 3 ingesters for replication: ingester: lifecycler: ring: replication_factor: 3 kvstore: store: memberlist
# Distributor config: distributor: ring: kvstore: store: memberlist ```
Step 8: Check Storage Performance
```bash # Check storage latency: curl http://localhost:3200/metrics | grep tempo_ingester_flush_duration_seconds
# For S3 storage: aws s3api head-bucket --bucket my-tempo-bucket
# Check block list: curl http://localhost:3200/api/blocks
# Storage config: storage: trace: backend: s3 s3: bucket: my-tempo-bucket endpoint: s3.us-east-1.amazonaws.com region: us-east-1
blocklist_poll: 5m blocklist_poll_concurrency: 10
# Local storage for development: storage: trace: backend: local local: path: /var/lib/tempo/traces ```
Step 9: Configure Backoff
```yaml # Configure client backoff when rate limited:
# In OpenTelemetry Collector: exporters: otlp: endpoint: tempo:4317 retry_on_failure: enabled: true initial_interval: 5s max_interval: 30s max_elapsed_time: 300s sending_queue: enabled: true num_consumers: 10 queue_size: 5000
# Application SDK backoff: # OTel SDK auto-retries with backoff
# Jaeger client: JAEGER_ENDPOINT=http://tempo:14268/api/traces # Client will retry on 429
# Tempo distributor backoff: distributor: rate_limiting_strategy: type: global grace_period: 10s retry_after: 30s ```
Step 10: Monitor Ingestion
```bash # Create monitoring script: cat << 'EOF' > /usr/local/bin/monitor-tempo.sh #!/bin/bash
echo "=== Tempo Ingestion Rate ===" curl -s http://localhost:3200/metrics | grep -E "tempo_distributor_(spans|bytes)_received_total"
echo "" echo "=== Rate Limit Rejections ===" curl -s http://localhost:3200/metrics | grep tempo_distributor_spans_rejected_total
echo "" echo "=== Ingester Health ===" curl -s http://localhost:3200/ingester/ring | jq '.shards | length'
echo "" echo "=== Active Tenants ===" curl -s http://localhost:3200/metrics | grep tempo_ingester_active_tenants
echo "" echo "=== Trace Creation Rate ===" curl -s http://localhost:3200/metrics | grep tempo_ingester_traces_created_total EOF
chmod +x /usr/local/bin/monitor-tempo.sh
# Prometheus alerts: - alert: TempoRateLimitExceeded expr: rate(tempo_distributor_spans_rejected_total[5m]) > 0 for: 2m labels: severity: warning annotations: summary: "Tempo rejecting traces due to rate limit"
- alert: TempoIngestionHigh
- expr: rate(tempo_distributor_bytes_received_total[5m]) > 50000000
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "Tempo ingestion rate high (>50MB/s)"
`
Tempo Ingestion Rate Limit Checklist
| Check | Command | Expected |
|---|---|---|
| Rate limit | config | Adequate |
| Rejection rate | metrics | Zero/low |
| Trace volume | metrics | Within limit |
| Ingester count | ring | Sufficient |
| Tenant config | overrides | Per-tenant |
| Trace size | metrics | Reasonable |
Verify the Fix
```bash # After increasing limits
# 1. Send test trace curl -X POST http://tempo:14268/api/traces -d @test-trace.json // Accepted (200)
# 2. Check rejection rate curl http://localhost:3200/metrics | grep tempo_distributor_spans_rejected_total // No increase
# 3. Monitor ingestion /usr/local/bin/monitor-tempo.sh // Rate within limits
# 4. Query trace in Grafana # Trace ID search // Trace found
# 5. Test burst capacity # Send burst of traces // All accepted
# 6. Check logs journalctl -u tempo | grep "rate limit" // No rate limit warnings ```
Related Issues
- [Fix Loki Ingestion Rate Limit](/articles/fix-loki-ingestion-rate-limit)
- [Fix Tempo Trace Not Found](/articles/fix-tempo-trace-not-found)
- [Fix Prometheus Remote Write Queue Full](/articles/fix-prometheus-remote-write-queue-full)