What's Actually Happening

Grafana Tempo rejects traces when ingestion rate exceeds configured limits. Applications receive rate limit errors when sending trace data.

The Error You'll See

Rate limit error:

```bash $ curl -X POST http://tempo:14268/api/traces -d @trace.json

HTTP/1.1 429 Too Many Requests Content-Type: application/json

{ "error": "ingestion rate limit exceeded", "tenant": "anonymous" } ```

OTLP rejection:

bash
ERROR: trace data rejected: rate limit exceeded for tenant

Tempo logs:

```bash $ journalctl -u tempo | grep rate

WARN ingestion rate limit exceeded for tenant anonymous, rejecting traces level=warn msg="pusher failed to consume trace data" err="rate limit exceeded" ```

Why This Happens

  1. 1.Trace volume spike - Sudden increase in trace generation
  2. 2.Limit too low - Default limits insufficient for load
  3. 3.Missing tenant - Anonymous tenant uses default limits
  4. 4.Ingester overload - Not enough ingester capacity
  5. 5.Large traces - Individual traces too big
  6. 6.High cardinality - Too many unique trace IDs

Step 1: Check Tempo Status

```bash # Check Tempo running: systemctl status tempo

# Check Tempo metrics: curl http://localhost:3200/metrics | grep -E "tempo_(ingester|distributor)"

# Check current rate limits: curl http://localhost:3200/config | jq '.limits'

# Check ingester ring: curl http://localhost:3200/ingester/ring

# Check distributor ring: curl http://localhost:3200/distributor/ring

# Check rejection rate: curl http://localhost:3200/metrics | grep tempo_distributor_spans_rejected ```

Step 2: Check Rate Limit Configuration

```bash # Check current config: cat /etc/tempo/tempo.yaml | grep -A 20 limits

# Default limits: limits: ingestion_rate_strategy: global ingestion_rate_limit_bytes: 15MB # Per tenant ingestion_burst_size_bytes: 20MB max_bytes_per_trace: 5MB max_traces_per_user: 10000 max_spans_per_user: 0 # Unlimited

# Per-tenant overrides: overrides: tenant1: ingestion_rate_limit_bytes: 50MB max_bytes_per_trace: 10MB ```

Step 3: Increase Rate Limits

```yaml # In tempo.yaml: limits: # Global rate limit ingestion_rate_strategy: global ingestion_rate_limit_bytes: 50MB # Increase from 15MB ingestion_burst_size_bytes: 75MB # Burst allowance

# Per trace limits max_bytes_per_trace: 10MB # Increase from 5MB max_traces_per_user: 50000 max_spans_per_user: 0 # Unlimited spans

# Search limits max_search_duration: 4h max_search_bytes_per_trace: 0

# Per-tenant override: overrides: my-tenant: ingestion_rate_limit_bytes: 100MB ingestion_burst_size_bytes: 150MB max_bytes_per_trace: 20MB

# Restart Tempo: systemctl restart tempo

# Or for Kubernetes: kubectl rollout restart deployment/tempo ```

Step 4: Check Trace Volume

```bash # Check trace ingestion rate: curl http://localhost:3200/metrics | grep tempo_distributor_spans_received_total

# Check bytes received: curl http://localhost:3200/metrics | grep tempo_distributor_bytes_received_total

# Calculate rate: curl http://localhost:3200/metrics | grep tempo_distributor_bytes_received_total | awk '{print $2}' sleep 60 curl http://localhost:3200/metrics | grep tempo_distributor_bytes_received_total | awk '{print $2}'

# Check active tenants: curl http://localhost:3200/metrics | grep tempo_ingester_active_tenants

# Check traces per tenant: curl http://localhost:3200/metrics | grep tempo_ingester_traces_created_total

# Identify high-volume tenants: curl http://localhost:3200/distributor/ring | jq ```

Step 5: Configure Multi-Tenancy

```yaml # Enable multi-tenancy for proper rate limiting:

# In tempo.yaml: server: http_listen_port: 3200

distributor: receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318

# Auth configuration: auth_enabled: true

# Tenant limits: overrides: tenant-a: ingestion_rate_limit_bytes: 50MB max_bytes_per_trace: 10MB tenant-b: ingestion_rate_limit_bytes: 100MB max_bytes_per_trace: 20MB

# Send tenant header: # X-Scope-OrgID: tenant-a

# With OTLP: curl -X POST http://tempo:4318/v1/traces \ -H "X-Scope-OrgID: tenant-a" \ -H "Content-Type: application/json" \ -d @trace.json ```

Step 6: Optimize Trace Size

```bash # Check trace sizes: curl http://localhost:3200/metrics | grep tempo_distributor_bytes_per_trace

# Large traces increase rate: # Reduce trace size in application:

# 1. Limit attributes: # Don't include high-cardinality data as attributes # Use structured logging instead

# 2. Batch spans: # OTLP batch processor: processors: batch: timeout: 1s send_batch_size: 1024 send_batch_max_size: 2048

# 3. Sampling: # Use tail-based or head-based sampling:

# Probabilistic sampling: receivers: otlp: protocols: grpc:

processors: probabilistic_sampler: hash_seed: 22 sampling_percentage: 10 # Sample 10% of traces

# 4. Filter spans: processors: filter: spans: exclude: match_type: strict span_names: - healthcheck - metrics ```

Step 7: Scale Ingester

```yaml # For Tempo microservices mode:

# Scale ingesters: apiVersion: apps/v1 kind: Deployment metadata: name: tempo-ingester spec: replicas: 3 # Increase from 1 template: spec: containers: - name: tempo args: - -target=ingester resources: limits: memory: 16Gi cpu: 4 requests: memory: 8Gi cpu: 2

# Check ring: curl http://tempo:3200/ingester/ring

# Minimum 3 ingesters for replication: ingester: lifecycler: ring: replication_factor: 3 kvstore: store: memberlist

# Distributor config: distributor: ring: kvstore: store: memberlist ```

Step 8: Check Storage Performance

```bash # Check storage latency: curl http://localhost:3200/metrics | grep tempo_ingester_flush_duration_seconds

# For S3 storage: aws s3api head-bucket --bucket my-tempo-bucket

# Check block list: curl http://localhost:3200/api/blocks

# Storage config: storage: trace: backend: s3 s3: bucket: my-tempo-bucket endpoint: s3.us-east-1.amazonaws.com region: us-east-1

blocklist_poll: 5m blocklist_poll_concurrency: 10

# Local storage for development: storage: trace: backend: local local: path: /var/lib/tempo/traces ```

Step 9: Configure Backoff

```yaml # Configure client backoff when rate limited:

# In OpenTelemetry Collector: exporters: otlp: endpoint: tempo:4317 retry_on_failure: enabled: true initial_interval: 5s max_interval: 30s max_elapsed_time: 300s sending_queue: enabled: true num_consumers: 10 queue_size: 5000

# Application SDK backoff: # OTel SDK auto-retries with backoff

# Jaeger client: JAEGER_ENDPOINT=http://tempo:14268/api/traces # Client will retry on 429

# Tempo distributor backoff: distributor: rate_limiting_strategy: type: global grace_period: 10s retry_after: 30s ```

Step 10: Monitor Ingestion

```bash # Create monitoring script: cat << 'EOF' > /usr/local/bin/monitor-tempo.sh #!/bin/bash

echo "=== Tempo Ingestion Rate ===" curl -s http://localhost:3200/metrics | grep -E "tempo_distributor_(spans|bytes)_received_total"

echo "" echo "=== Rate Limit Rejections ===" curl -s http://localhost:3200/metrics | grep tempo_distributor_spans_rejected_total

echo "" echo "=== Ingester Health ===" curl -s http://localhost:3200/ingester/ring | jq '.shards | length'

echo "" echo "=== Active Tenants ===" curl -s http://localhost:3200/metrics | grep tempo_ingester_active_tenants

echo "" echo "=== Trace Creation Rate ===" curl -s http://localhost:3200/metrics | grep tempo_ingester_traces_created_total EOF

chmod +x /usr/local/bin/monitor-tempo.sh

# Prometheus alerts: - alert: TempoRateLimitExceeded expr: rate(tempo_distributor_spans_rejected_total[5m]) > 0 for: 2m labels: severity: warning annotations: summary: "Tempo rejecting traces due to rate limit"

  • alert: TempoIngestionHigh
  • expr: rate(tempo_distributor_bytes_received_total[5m]) > 50000000
  • for: 5m
  • labels:
  • severity: warning
  • annotations:
  • summary: "Tempo ingestion rate high (>50MB/s)"
  • `

Tempo Ingestion Rate Limit Checklist

CheckCommandExpected
Rate limitconfigAdequate
Rejection ratemetricsZero/low
Trace volumemetricsWithin limit
Ingester countringSufficient
Tenant configoverridesPer-tenant
Trace sizemetricsReasonable

Verify the Fix

```bash # After increasing limits

# 1. Send test trace curl -X POST http://tempo:14268/api/traces -d @test-trace.json // Accepted (200)

# 2. Check rejection rate curl http://localhost:3200/metrics | grep tempo_distributor_spans_rejected_total // No increase

# 3. Monitor ingestion /usr/local/bin/monitor-tempo.sh // Rate within limits

# 4. Query trace in Grafana # Trace ID search // Trace found

# 5. Test burst capacity # Send burst of traces // All accepted

# 6. Check logs journalctl -u tempo | grep "rate limit" // No rate limit warnings ```

  • [Fix Loki Ingestion Rate Limit](/articles/fix-loki-ingestion-rate-limit)
  • [Fix Tempo Trace Not Found](/articles/fix-tempo-trace-not-found)
  • [Fix Prometheus Remote Write Queue Full](/articles/fix-prometheus-remote-write-queue-full)