What's Actually Happening

OpenTelemetry Collector fails to export telemetry data to backends due to timeouts. Data backs up in queues and may be dropped.

The Error You'll See

Exporter timeout:

json
{
  "level": "error",
  "msg": "Export failed",
  "error": "context deadline exceeded",
  "exporter": "otlp"
}

Queue full:

json
{
  "level": "error",
  "msg": "Dropping data because the queue is full",
  "queue_size": 5000,
  "dropped_items": 100
}

Connection failed:

json
{
  "level": "error",
  "msg": "Failed to connect to endpoint",
  "error": "connection refused",
  "endpoint": "tempo:4317"
}

Why This Happens

  1. 1.Backend unavailable - Destination service down
  2. 2.Network issues - High latency or packet loss
  3. 3.Timeout too short - Not enough time for export
  4. 4.Queue overflow - Queue size insufficient
  5. 5.Large batch size - Batches too big for network
  6. 6.SSL issues - Certificate problems

Step 1: Check Collector Status

```bash # Check collector running: systemctl status otelcol

# Check collector metrics: curl http://localhost:8888/metrics

# Check collector health: curl http://localhost:13133/health

# Check zpages: curl http://localhost:55679/debug/exporter

# Check pipeline status: curl http://localhost:8888/metrics | grep otelcol_exporter

# Key metrics: # otelcol_exporter_send_failed_spans # otelcol_exporter_sent_spans # otelcol_exporter_queue_size ```

Step 2: Check Exporter Configuration

```yaml # Check current config: cat /etc/otelcol/config.yaml

# Exporter configuration: exporters: otlp: endpoint: tempo:4317 tls: insecure: true timeout: 10s retry_on_failure: enabled: true initial_interval: 5s max_interval: 30s max_elapsed_time: 300s sending_queue: enabled: true num_consumers: 10 queue_size: 5000 ```

Step 3: Increase Timeout Settings

```yaml # In config.yaml: exporters: otlp: endpoint: tempo:4317 # Increase timeout from default 10s timeout: 30s

otlphttp: endpoint: http://tempo:4318 timeout: 30s # HTTP specific read_buffer_size: 0 write_buffer_size: 512KB max_idle_conns: 100 max_idle_conns_per_host: 10 idle_conn_timeout: 90s

prometheusremotewrite: endpoint: http://cortex:8080/api/prom/push timeout: 30s

elasticsearch: endpoints: - http://elasticsearch:9200 timeout: 30s index: traces

# Restart collector: systemctl restart otelcol ```

Step 4: Configure Retry Settings

```yaml # Robust retry configuration: exporters: otlp: endpoint: tempo:4317 timeout: 30s

retry_on_failure: enabled: true # Initial backoff initial_interval: 5s # Max backoff between retries max_interval: 60s # Total time to retry before giving up max_elapsed_time: 600s # 10 minutes

# Retry on specific status codes retry_on_status_codes: - 429 # Rate limited - 503 # Service unavailable - 502 # Bad gateway - 504 # Gateway timeout

# For debugging: service: telemetry: logs: level: debug ```

Step 5: Configure Sending Queue

```yaml # Queue configuration: exporters: otlp: endpoint: tempo:4317 timeout: 30s

sending_queue: enabled: true # Number of concurrent consumers num_consumers: 20 # Queue size (items, not bytes) queue_size: 10000 # Enable blocking when queue full # blocking: true # Can cause backpressure

# For persistent queue (0.54+): sending_queue: storage: file_storage

storage: file_storage: directory: /var/lib/otelcol/queue timeout: 10s

# Create queue directory: mkdir -p /var/lib/otelcol/queue chown otelcol:otelcol /var/lib/otelcol/queue ```

Step 6: Check Backend Connectivity

```bash # Test endpoint connectivity: nc -zv tempo 4317

# Test with curl: curl -v http://tempo:4318/v1/traces

# Check TLS: openssl s_client -connect tempo:4317

# For insecure (testing): exporters: otlp: endpoint: tempo:4317 tls: insecure: true insecure_skip_verify: true

# For TLS: exporters: otlp: endpoint: tempo:4317 tls: ca_file: /etc/ssl/certs/ca.crt cert_file: /etc/ssl/certs/client.crt key_file: /etc/ssl/private/client.key

# Check DNS resolution: nslookup tempo dig tempo

# Test round-trip time: curl -w "@curl-format.txt" -o /dev/null -s http://tempo:4318/health ```

Step 7: Configure Batching

```yaml # Add batch processor: processors: batch: # Send batch after this many items send_batch_size: 1024 # Maximum batch size send_batch_max_size: 2048 # Timeout before sending partial batch timeout: 10s # Metadata keys to batch metadata_keys: [] # Metadata cardinality limit metadata_cardinality_limit: 1000

# Memory limiter: processors: memory_limiter: check_interval: 1s limit_mib: 512 spike_limit_mib: 128

# Pipeline with batch: service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [otlp] ```

Step 8: Add Fallback Exporters

```yaml # Multiple exporters with fallback: exporters: otlp/primary: endpoint: tempo-primary:4317 timeout: 30s retry_on_failure: enabled: true

otlp/secondary: endpoint: tempo-secondary:4317 timeout: 30s

file/fallback: path: /var/log/otelcol/traces.json

# Use exporter helper for routing: connectors: forward: exporters: - otlp/primary - otlp/secondary

# Or with routing processor: processors: routing: default_exporters: - otlp/primary table: - statement: 'exporter == "secondary"' exporters: - otlp/secondary

# Alternative: Use two pipelines service: pipelines: traces/primary: receivers: [otlp] exporters: [otlp/primary] traces/fallback: receivers: [otlp] exporters: [file/fallback] ```

Step 9: Monitor Exporter Performance

```bash # Create monitoring script: cat << 'EOF' > /usr/local/bin/monitor-otelcol.sh #!/bin/bash

echo "=== Exporter Stats ===" curl -s http://localhost:8888/metrics | grep -E "otelcol_exporter_(sent|failed)_"

echo "" echo "=== Queue Size ===" curl -s http://localhost:8888/metrics | grep otelcol_exporter_queue_size

echo "" echo "=== Receiver Stats ===" curl -s http://localhost:8888/metrics | grep -E "otelcol_receiver_(accepted|refused)_"

echo "" echo "=== Processor Stats ===" curl -s http://localhost:8888/metrics | grep otelcol_processor_

echo "" echo "=== Memory Usage ===" curl -s http://localhost:8888/metrics | grep otelcol_process_memory_rss EOF

chmod +x /usr/local/bin/monitor-otelcol.sh

# Prometheus alerts: - alert: OTelExporterFailures expr: rate(otelcol_exporter_send_failed_spans[5m]) > 0 for: 2m labels: severity: warning annotations: summary: "OpenTelemetry Collector exporter failures"

  • alert: OTelQueueFull
  • expr: otelcol_exporter_queue_size >= otelcol_exporter_queue_capacity
  • for: 2m
  • labels:
  • severity: critical
  • annotations:
  • summary: "OpenTelemetry Collector exporter queue full"
  • `

Step 10: Enable Debug Telemetry

```yaml # Enable debug logging: service: telemetry: logs: level: debug output_paths: - /var/log/otelcol/otelcol.log error_output_paths: - /var/log/otelcol/error.log metrics: level: detailed address: 0.0.0.0:8888

# Check debug logs: tail -f /var/log/otelcol/otelcol.log | grep -i "exporter|timeout"

# Use zpages for real-time debugging: # In config: extensions: zpages: endpoint: 0.0.0.0:55679

# Access zpages: # http://localhost:55679/debug/tracez # http://localhost:55679/debug/pipelinez ```

OTel Collector Exporter Timeout Checklist

CheckCommandExpected
Collector runningsystemctl statusActive
Backend endpointnc -zvConnected
Timeout configconfig.yamlAdequate
Queue sizemetrics< capacity
Retry enabledconfig.yamlYes
Batch configuredconfig.yamlYes

Verify the Fix

```bash # After fixing exporter timeout

# 1. Check collector health curl http://localhost:13133/health // Status: OK

# 2. Check exporter metrics curl http://localhost:8888/metrics | grep otelcol_exporter_sent // > 0

# 3. Check no failures curl http://localhost:8888/metrics | grep otelcol_exporter_send_failed // 0

# 4. Send test data curl -X POST http://localhost:4318/v1/traces -d @test-trace.json // Accepted

# 5. Check logs tail /var/log/otelcol/otelcol.log | grep -i timeout // No timeout errors

# 6. Monitor under load /usr/local/bin/monitor-otelcol.sh // All exports successful ```

  • [Fix OpenTelemetry Collector Error](/articles/fix-opentelemetry-collector-error)
  • [Fix Tempo Trace Not Found](/articles/fix-tempo-trace-not-found)
  • [Fix Prometheus Remote Write Queue Full](/articles/fix-prometheus-remote-write-queue-full)