What's Actually Happening
OpenTelemetry Collector fails to export telemetry data to backends due to timeouts. Data backs up in queues and may be dropped.
The Error You'll See
Exporter timeout:
{
"level": "error",
"msg": "Export failed",
"error": "context deadline exceeded",
"exporter": "otlp"
}Queue full:
{
"level": "error",
"msg": "Dropping data because the queue is full",
"queue_size": 5000,
"dropped_items": 100
}Connection failed:
{
"level": "error",
"msg": "Failed to connect to endpoint",
"error": "connection refused",
"endpoint": "tempo:4317"
}Why This Happens
- 1.Backend unavailable - Destination service down
- 2.Network issues - High latency or packet loss
- 3.Timeout too short - Not enough time for export
- 4.Queue overflow - Queue size insufficient
- 5.Large batch size - Batches too big for network
- 6.SSL issues - Certificate problems
Step 1: Check Collector Status
```bash # Check collector running: systemctl status otelcol
# Check collector metrics: curl http://localhost:8888/metrics
# Check collector health: curl http://localhost:13133/health
# Check zpages: curl http://localhost:55679/debug/exporter
# Check pipeline status: curl http://localhost:8888/metrics | grep otelcol_exporter
# Key metrics: # otelcol_exporter_send_failed_spans # otelcol_exporter_sent_spans # otelcol_exporter_queue_size ```
Step 2: Check Exporter Configuration
```yaml # Check current config: cat /etc/otelcol/config.yaml
# Exporter configuration: exporters: otlp: endpoint: tempo:4317 tls: insecure: true timeout: 10s retry_on_failure: enabled: true initial_interval: 5s max_interval: 30s max_elapsed_time: 300s sending_queue: enabled: true num_consumers: 10 queue_size: 5000 ```
Step 3: Increase Timeout Settings
```yaml # In config.yaml: exporters: otlp: endpoint: tempo:4317 # Increase timeout from default 10s timeout: 30s
otlphttp: endpoint: http://tempo:4318 timeout: 30s # HTTP specific read_buffer_size: 0 write_buffer_size: 512KB max_idle_conns: 100 max_idle_conns_per_host: 10 idle_conn_timeout: 90s
prometheusremotewrite: endpoint: http://cortex:8080/api/prom/push timeout: 30s
elasticsearch: endpoints: - http://elasticsearch:9200 timeout: 30s index: traces
# Restart collector: systemctl restart otelcol ```
Step 4: Configure Retry Settings
```yaml # Robust retry configuration: exporters: otlp: endpoint: tempo:4317 timeout: 30s
retry_on_failure: enabled: true # Initial backoff initial_interval: 5s # Max backoff between retries max_interval: 60s # Total time to retry before giving up max_elapsed_time: 600s # 10 minutes
# Retry on specific status codes retry_on_status_codes: - 429 # Rate limited - 503 # Service unavailable - 502 # Bad gateway - 504 # Gateway timeout
# For debugging: service: telemetry: logs: level: debug ```
Step 5: Configure Sending Queue
```yaml # Queue configuration: exporters: otlp: endpoint: tempo:4317 timeout: 30s
sending_queue: enabled: true # Number of concurrent consumers num_consumers: 20 # Queue size (items, not bytes) queue_size: 10000 # Enable blocking when queue full # blocking: true # Can cause backpressure
# For persistent queue (0.54+): sending_queue: storage: file_storage
storage: file_storage: directory: /var/lib/otelcol/queue timeout: 10s
# Create queue directory: mkdir -p /var/lib/otelcol/queue chown otelcol:otelcol /var/lib/otelcol/queue ```
Step 6: Check Backend Connectivity
```bash # Test endpoint connectivity: nc -zv tempo 4317
# Test with curl: curl -v http://tempo:4318/v1/traces
# Check TLS: openssl s_client -connect tempo:4317
# For insecure (testing): exporters: otlp: endpoint: tempo:4317 tls: insecure: true insecure_skip_verify: true
# For TLS: exporters: otlp: endpoint: tempo:4317 tls: ca_file: /etc/ssl/certs/ca.crt cert_file: /etc/ssl/certs/client.crt key_file: /etc/ssl/private/client.key
# Check DNS resolution: nslookup tempo dig tempo
# Test round-trip time: curl -w "@curl-format.txt" -o /dev/null -s http://tempo:4318/health ```
Step 7: Configure Batching
```yaml # Add batch processor: processors: batch: # Send batch after this many items send_batch_size: 1024 # Maximum batch size send_batch_max_size: 2048 # Timeout before sending partial batch timeout: 10s # Metadata keys to batch metadata_keys: [] # Metadata cardinality limit metadata_cardinality_limit: 1000
# Memory limiter: processors: memory_limiter: check_interval: 1s limit_mib: 512 spike_limit_mib: 128
# Pipeline with batch: service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [otlp] ```
Step 8: Add Fallback Exporters
```yaml # Multiple exporters with fallback: exporters: otlp/primary: endpoint: tempo-primary:4317 timeout: 30s retry_on_failure: enabled: true
otlp/secondary: endpoint: tempo-secondary:4317 timeout: 30s
file/fallback: path: /var/log/otelcol/traces.json
# Use exporter helper for routing: connectors: forward: exporters: - otlp/primary - otlp/secondary
# Or with routing processor: processors: routing: default_exporters: - otlp/primary table: - statement: 'exporter == "secondary"' exporters: - otlp/secondary
# Alternative: Use two pipelines service: pipelines: traces/primary: receivers: [otlp] exporters: [otlp/primary] traces/fallback: receivers: [otlp] exporters: [file/fallback] ```
Step 9: Monitor Exporter Performance
```bash # Create monitoring script: cat << 'EOF' > /usr/local/bin/monitor-otelcol.sh #!/bin/bash
echo "=== Exporter Stats ===" curl -s http://localhost:8888/metrics | grep -E "otelcol_exporter_(sent|failed)_"
echo "" echo "=== Queue Size ===" curl -s http://localhost:8888/metrics | grep otelcol_exporter_queue_size
echo "" echo "=== Receiver Stats ===" curl -s http://localhost:8888/metrics | grep -E "otelcol_receiver_(accepted|refused)_"
echo "" echo "=== Processor Stats ===" curl -s http://localhost:8888/metrics | grep otelcol_processor_
echo "" echo "=== Memory Usage ===" curl -s http://localhost:8888/metrics | grep otelcol_process_memory_rss EOF
chmod +x /usr/local/bin/monitor-otelcol.sh
# Prometheus alerts: - alert: OTelExporterFailures expr: rate(otelcol_exporter_send_failed_spans[5m]) > 0 for: 2m labels: severity: warning annotations: summary: "OpenTelemetry Collector exporter failures"
- alert: OTelQueueFull
- expr: otelcol_exporter_queue_size >= otelcol_exporter_queue_capacity
- for: 2m
- labels:
- severity: critical
- annotations:
- summary: "OpenTelemetry Collector exporter queue full"
`
Step 10: Enable Debug Telemetry
```yaml # Enable debug logging: service: telemetry: logs: level: debug output_paths: - /var/log/otelcol/otelcol.log error_output_paths: - /var/log/otelcol/error.log metrics: level: detailed address: 0.0.0.0:8888
# Check debug logs: tail -f /var/log/otelcol/otelcol.log | grep -i "exporter|timeout"
# Use zpages for real-time debugging: # In config: extensions: zpages: endpoint: 0.0.0.0:55679
# Access zpages: # http://localhost:55679/debug/tracez # http://localhost:55679/debug/pipelinez ```
OTel Collector Exporter Timeout Checklist
| Check | Command | Expected |
|---|---|---|
| Collector running | systemctl status | Active |
| Backend endpoint | nc -zv | Connected |
| Timeout config | config.yaml | Adequate |
| Queue size | metrics | < capacity |
| Retry enabled | config.yaml | Yes |
| Batch configured | config.yaml | Yes |
Verify the Fix
```bash # After fixing exporter timeout
# 1. Check collector health curl http://localhost:13133/health // Status: OK
# 2. Check exporter metrics curl http://localhost:8888/metrics | grep otelcol_exporter_sent // > 0
# 3. Check no failures curl http://localhost:8888/metrics | grep otelcol_exporter_send_failed // 0
# 4. Send test data curl -X POST http://localhost:4318/v1/traces -d @test-trace.json // Accepted
# 5. Check logs tail /var/log/otelcol/otelcol.log | grep -i timeout // No timeout errors
# 6. Monitor under load /usr/local/bin/monitor-otelcol.sh // All exports successful ```
Related Issues
- [Fix OpenTelemetry Collector Error](/articles/fix-opentelemetry-collector-error)
- [Fix Tempo Trace Not Found](/articles/fix-tempo-trace-not-found)
- [Fix Prometheus Remote Write Queue Full](/articles/fix-prometheus-remote-write-queue-full)