Fix OpenTelemetry Collector Exporter Timeout

What's Actually Happening

OpenTelemetry Collector fails to export telemetry data to backends due to timeouts. Data backs up in queues and may be dropped.

The Error You'll See

Exporter timeout:

json

{
  "level": "error",
  "msg": "Export failed",
  "error": "context deadline exceeded",
  "exporter": "otlp"
}

Queue full:

json

{
  "level": "error",
  "msg": "Dropping data because the queue is full",
  "queue_size": 5000,
  "dropped_items": 100
}

Connection failed:

json

{
  "level": "error",
  "msg": "Failed to connect to endpoint",
  "error": "connection refused",
  "endpoint": "tempo:4317"
}

Why This Happens

1.Backend unavailable - Destination service down
2.Network issues - High latency or packet loss
3.Timeout too short - Not enough time for export
4.Queue overflow - Queue size insufficient
5.Large batch size - Batches too big for network
6.SSL issues - Certificate problems

Step 1: Check Collector Status

```bash # Check collector running: systemctl status otelcol

# Check collector metrics: curl http://localhost:8888/metrics

# Check collector health: curl http://localhost:13133/health

# Check zpages: curl http://localhost:55679/debug/exporter

# Check pipeline status: curl http://localhost:8888/metrics | grep otelcol_exporter

# Key metrics: # otelcol_exporter_send_failed_spans # otelcol_exporter_sent_spans # otelcol_exporter_queue_size ```

Step 2: Check Exporter Configuration

```yaml # Check current config: cat /etc/otelcol/config.yaml

# Exporter configuration: exporters: otlp: endpoint: tempo:4317 tls: insecure: true timeout: 10s retry_on_failure: enabled: true initial_interval: 5s max_interval: 30s max_elapsed_time: 300s sending_queue: enabled: true num_consumers: 10 queue_size: 5000 ```

Step 3: Increase Timeout Settings

```yaml # In config.yaml: exporters: otlp: endpoint: tempo:4317 # Increase timeout from default 10s timeout: 30s

otlphttp: endpoint: http://tempo:4318 timeout: 30s # HTTP specific read_buffer_size: 0 write_buffer_size: 512KB max_idle_conns: 100 max_idle_conns_per_host: 10 idle_conn_timeout: 90s

prometheusremotewrite: endpoint: http://cortex:8080/api/prom/push timeout: 30s

elasticsearch: endpoints: - http://elasticsearch:9200 timeout: 30s index: traces

# Restart collector: systemctl restart otelcol ```

Step 4: Configure Retry Settings

```yaml # Robust retry configuration: exporters: otlp: endpoint: tempo:4317 timeout: 30s

retry_on_failure: enabled: true # Initial backoff initial_interval: 5s # Max backoff between retries max_interval: 60s # Total time to retry before giving up max_elapsed_time: 600s # 10 minutes

# Retry on specific status codes retry_on_status_codes: - 429 # Rate limited - 503 # Service unavailable - 502 # Bad gateway - 504 # Gateway timeout

# For debugging: service: telemetry: logs: level: debug ```

Step 5: Configure Sending Queue

```yaml # Queue configuration: exporters: otlp: endpoint: tempo:4317 timeout: 30s

sending_queue: enabled: true # Number of concurrent consumers num_consumers: 20 # Queue size (items, not bytes) queue_size: 10000 # Enable blocking when queue full # blocking: true # Can cause backpressure

# For persistent queue (0.54+): sending_queue: storage: file_storage

storage: file_storage: directory: /var/lib/otelcol/queue timeout: 10s

# Create queue directory: mkdir -p /var/lib/otelcol/queue chown otelcol:otelcol /var/lib/otelcol/queue ```

Step 6: Check Backend Connectivity

```bash # Test endpoint connectivity: nc -zv tempo 4317

# Test with curl: curl -v http://tempo:4318/v1/traces

# Check TLS: openssl s_client -connect tempo:4317

# For insecure (testing): exporters: otlp: endpoint: tempo:4317 tls: insecure: true insecure_skip_verify: true

# For TLS: exporters: otlp: endpoint: tempo:4317 tls: ca_file: /etc/ssl/certs/ca.crt cert_file: /etc/ssl/certs/client.crt key_file: /etc/ssl/private/client.key

# Check DNS resolution: nslookup tempo dig tempo

# Test round-trip time: curl -w "@curl-format.txt" -o /dev/null -s http://tempo:4318/health ```

Step 7: Configure Batching

```yaml # Add batch processor: processors: batch: # Send batch after this many items send_batch_size: 1024 # Maximum batch size send_batch_max_size: 2048 # Timeout before sending partial batch timeout: 10s # Metadata keys to batch metadata_keys: [] # Metadata cardinality limit metadata_cardinality_limit: 1000

# Memory limiter: processors: memory_limiter: check_interval: 1s limit_mib: 512 spike_limit_mib: 128

# Pipeline with batch: service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [otlp] ```

Step 8: Add Fallback Exporters

```yaml # Multiple exporters with fallback: exporters: otlp/primary: endpoint: tempo-primary:4317 timeout: 30s retry_on_failure: enabled: true

otlp/secondary: endpoint: tempo-secondary:4317 timeout: 30s

file/fallback: path: /var/log/otelcol/traces.json

# Use exporter helper for routing: connectors: forward: exporters: - otlp/primary - otlp/secondary

# Or with routing processor: processors: routing: default_exporters: - otlp/primary table: - statement: 'exporter == "secondary"' exporters: - otlp/secondary

# Alternative: Use two pipelines service: pipelines: traces/primary: receivers: [otlp] exporters: [otlp/primary] traces/fallback: receivers: [otlp] exporters: [file/fallback] ```

Step 9: Monitor Exporter Performance

```bash # Create monitoring script: cat << 'EOF' > /usr/local/bin/monitor-otelcol.sh #!/bin/bash

echo "=== Exporter Stats ===" curl -s http://localhost:8888/metrics | grep -E "otelcol_exporter_(sent|failed)_"

echo "" echo "=== Queue Size ===" curl -s http://localhost:8888/metrics | grep otelcol_exporter_queue_size

echo "" echo "=== Receiver Stats ===" curl -s http://localhost:8888/metrics | grep -E "otelcol_receiver_(accepted|refused)_"

echo "" echo "=== Processor Stats ===" curl -s http://localhost:8888/metrics | grep otelcol_processor_

echo "" echo "=== Memory Usage ===" curl -s http://localhost:8888/metrics | grep otelcol_process_memory_rss EOF

chmod +x /usr/local/bin/monitor-otelcol.sh

# Prometheus alerts: - alert: OTelExporterFailures expr: rate(otelcol_exporter_send_failed_spans[5m]) > 0 for: 2m labels: severity: warning annotations: summary: "OpenTelemetry Collector exporter failures"

alert: OTelQueueFull
expr: otelcol_exporter_queue_size >= otelcol_exporter_queue_capacity
for: 2m
labels:
severity: critical
annotations:
summary: "OpenTelemetry Collector exporter queue full"
`

Step 10: Enable Debug Telemetry

```yaml # Enable debug logging: service: telemetry: logs: level: debug output_paths: - /var/log/otelcol/otelcol.log error_output_paths: - /var/log/otelcol/error.log metrics: level: detailed address: 0.0.0.0:8888

# Check debug logs: tail -f /var/log/otelcol/otelcol.log | grep -i "exporter|timeout"

# Use zpages for real-time debugging: # In config: extensions: zpages: endpoint: 0.0.0.0:55679

# Access zpages: # http://localhost:55679/debug/tracez # http://localhost:55679/debug/pipelinez ```

OTel Collector Exporter Timeout Checklist

Check	Command	Expected
Collector running	systemctl status	Active
Backend endpoint	nc -zv	Connected
Timeout config	config.yaml	Adequate
Queue size	metrics	< capacity
Retry enabled	config.yaml	Yes
Batch configured	config.yaml	Yes

Verify the Fix

```bash # After fixing exporter timeout

# 1. Check collector health curl http://localhost:13133/health // Status: OK

# 2. Check exporter metrics curl http://localhost:8888/metrics | grep otelcol_exporter_sent // > 0

# 3. Check no failures curl http://localhost:8888/metrics | grep otelcol_exporter_send_failed // 0

# 4. Send test data curl -X POST http://localhost:4318/v1/traces -d @test-trace.json // Accepted

# 5. Check logs tail /var/log/otelcol/otelcol.log | grep -i timeout // No timeout errors

# 6. Monitor under load /usr/local/bin/monitor-otelcol.sh // All exports successful ```

[Fix OpenTelemetry Collector Error](/articles/fix-opentelemetry-collector-error)
[Fix Tempo Trace Not Found](/articles/fix-tempo-trace-not-found)
[Fix Prometheus Remote Write Queue Full](/articles/fix-prometheus-remote-write-queue-full)

What's Actually Happening

The Error You'll See

Why This Happens

Step 1: Check Collector Status

Step 2: Check Exporter Configuration

Step 3: Increase Timeout Settings

Step 4: Configure Retry Settings

Step 5: Configure Sending Queue

Step 6: Check Backend Connectivity

Step 7: Configure Batching

Step 8: Add Fallback Exporters

Step 9: Monitor Exporter Performance

Step 10: Enable Debug Telemetry

OTel Collector Exporter Timeout Checklist

Verify the Fix

Related Issues

Share this guide

More Monitoring Troubleshooting Guides

Metric Retention Expired

Timeseries Storage Full

Collector Agent Crashed

Webhook Notification Timeout

SMS Notification Failed

Fix Fluentd Log Not Sending