Introduction
OpenTelemetry in .NET uses batch exporters to collect and send telemetry data (traces, metrics, logs) to backends like Jaeger, Zipkin, Prometheus, or OTLP collectors. The batch exporter accumulates telemetry items and sends them in periodic batches. When the batch send fails — due to network issues, collector unavailability, oversized payloads, TLS certificate errors, or incorrect endpoint configuration — telemetry data is dropped after the retry buffer is exhausted. Diagnosing these failures requires understanding the batch processor configuration, exporter retry logic, and collector connectivity.
Symptoms
- "Exporter failed to send batch" warnings in application logs
- Telemetry data not appearing in Jaeger, Grafana, or monitoring dashboard
HttpRequestExceptionconnecting to OTLP collector endpoint- "Payload too large" or 413 errors from collector
- Telemetry works in development but fails in production
- Memory usage grows as telemetry accumulates in export buffer
Error output:
``
OpenTelemetry.SDK.Export.ExportProcessor: Error: Export failed.
System.Net.Http.HttpRequestException: Connection refused (localhost:4317)
--- End of inner exception stack trace ---
OpenTelemetry.OTLP.OtlpExporter: Warning: Export failed for batch 1.
Dropping 250 spans.
Common Causes
- OTLP collector not running or endpoint URL incorrect
- gRPC endpoint uses HTTP instead of HTTPS (or vice versa)
- TLS certificate validation fails for collector
- Batch payload exceeds collector's maximum receive message size
- Export queue full — telemetry generated faster than it can be sent
- Missing authentication headers for cloud-based collectors
Step-by-Step Fix
- 1.Configure OTLP exporter with correct endpoint and protocol:
- 2.```csharp
- 3.builder.Services.AddOpenTelemetry()
- 4..WithTracing(tracing =>
- 5.{
- 6.tracing
- 7..AddSource("MyApp")
- 8..AddAspNetCoreInstrumentation()
- 9..AddHttpClientInstrumentation()
- 10..AddOtlpExporter(options =>
- 11.{
- 12.// Default endpoint is http://localhost:4318 for HTTP/protobuf
- 13.// or http://localhost:4317 for gRPC
- 14.options.Endpoint = new Uri(
- 15.builder.Configuration["OTEL_EXPORTER_OTLP_ENDPOINT"]
- 16.?? "http://localhost:4318");
// Use HTTP/protobuf (recommended for .NET) options.Protocol = OtlpExportProtocol.HttpProtobuf;
// Add headers for authentication options.Headers = builder.Configuration["OTEL_EXPORTER_OTLP_HEADERS"];
// Configure timeout options.TimeoutMilliseconds = 10000; }); });
// Environment variables for configuration // OTEL_EXPORTER_OTLP_ENDPOINT=http://collector:4318 // OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer my-token // OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf ```
- 1.Configure batch export processor for reliable delivery:
- 2.```csharp
- 3.builder.Services.AddOpenTelemetry()
- 4..WithTracing(tracing =>
- 5.{
- 6.tracing
- 7..AddSource("MyApp")
- 8..AddAspNetCoreInstrumentation()
- 9..AddOtlpExporter();
- 10.});
// Configure batch processor via environment variables: // OTEL_BLRP_MAX_QUEUE_SIZE=2048 // Max items in queue (default 2048) // OTEL_BLRP_MAX_EXPORT_BATCH_SIZE=512 // Max items per batch (default 512) // OTEL_BLRP_EXPORT_TIMEOUT=30000 // Export timeout in ms (default 30000) // OTEL_BLRP_SCHEDULE_DELAY=1000 // Interval between exports (default 1000)
// Or configure in code: builder.Services.Configure<BatchExportActivityProcessorOptions>(options => { options.MaxQueueSize = 4096; // Larger queue for high-throughput apps options.MaxExportBatchSize = 1024; // Larger batches for efficiency options.ScheduledDelayMilliseconds = 2000; // Send every 2 seconds options.ExporterTimeoutMilliseconds = 30000; // 30s timeout per export }); ```
- 1.Handle TLS and connectivity issues with collector:
- 2.```csharp
- 3.builder.Services.AddOpenTelemetry()
- 4..WithTracing(tracing =>
- 5.{
- 6.tracing.AddOtlpExporter(options =>
- 7.{
- 8.options.Endpoint = new Uri("https://collector.example.com:4318");
- 9.options.Protocol = OtlpExportProtocol.HttpProtobuf;
- 10.options.Headers = "Authorization=Bearer my-secret-token";
- 11.});
- 12.});
// For self-signed certificates in development (NOT production): builder.Services.ConfigureHttpClientDefaults(http => { http.ConfigurePrimaryHttpMessageHandler(() => { var handler = new HttpClientHandler();
// WARNING: Only for development/testing #if DEBUG handler.ServerCertificateCustomValidationCallback = HttpClientHandler.DangerousAcceptAnyServerCertificateValidator; #endif
return handler; }); });
// For production with custom CA: builder.Services.ConfigureHttpClientDefaults(http => { http.ConfigurePrimaryHttpMessageHandler(() => { var handler = new HttpClientHandler(); var caCertificate = X509Certificate.CreateFromCertFile("collector-ca.crt"); handler.ClientCertificates.Add(caCertificate); return handler; }); }); ```
- 1.Add export failure logging and telemetry health check:
- 2.```csharp
- 3.// Custom exporter that logs failures
- 4.public class LoggingExporter : BaseExporter<Activity>
- 5.{
- 6.private readonly ILogger<LoggingExporter> _logger;
public LoggingExporter(ILogger<LoggingExporter> logger) { _logger = logger; }
protected override ExportResult Export(in Batch<Activity> batch) { var count = 0; foreach (var activity in batch) { count++; }
if (count > 0) { _logger.LogInformation("Exported {Count} activities", count); }
return ExportResult.Success; } }
// Wrap OTLP exporter with logging builder.Services.AddOpenTelemetry() .WithTracing(tracing => { tracing .AddSource("MyApp") .AddAspNetCoreInstrumentation() .AddProcessor<BatchExportProcessor<Activity>>(sp => { var logger = sp.GetRequiredService<ILogger<LoggingExporter>>(); var otlpExporter = new OtlpTraceExporter( new OtlpExporterOptions { Endpoint = new Uri("http://localhost:4318") });
return new BatchExportProcessor<Activity>(otlpExporter); }); });
// Health check for telemetry connectivity builder.Services.AddHealthChecks() .AddCheck<OtelHealthCheck>("opentelemetry");
public class OtelHealthCheck : IHealthCheck { private readonly ILogger<OtelHealthCheck> _logger;
public async Task<HealthCheckResult> CheckHealthAsync( HealthCheckContext context, CancellationToken cancellationToken = default) { try { // Test connectivity to OTLP endpoint using var client = new HttpClient(); var response = await client.GetAsync( "http://localhost:4318/v1/traces", cancellationToken);
// 405 Method Not Allowed is expected (GET not supported on traces endpoint) // but 404 or connection refused means collector is down if (response.StatusCode == System.Net.HttpStatusCode.MethodNotAllowed) { return HealthCheckResult.Healthy("OTLP collector reachable"); }
return HealthCheckResult.Degraded( $"OTLP returned unexpected status: {response.StatusCode}"); } catch (Exception ex) { return HealthCheckResult.Unhealthy( $"Cannot reach OTLP collector: {ex.Message}"); } } } ```
Prevention
- Run the OTLP collector as a sidecar or local process in production
- Set
MaxQueueSizelarge enough to absorb collector downtime (at least 10 minutes of telemetry) - Use HTTP/protobuf protocol over gRPC for better .NET compatibility
- Configure environment variables for endpoint, headers, and protocol
- Add health checks that verify collector connectivity
- Monitor queue utilization and export failure rates
- Use a local OpenTelemetry Collector agent that batches and forwards to remote backends
- Test telemetry end-to-end in staging before deploying to production
- Consider using the OpenTelemetry Collector's retry and buffering capabilities for unreliable networks