Fix gRPC Client Channel Transient Failure Retry

Introduction

gRPC uses HTTP/2 long-lived connections that are susceptible to transient failures: network blips, server restarts, load balancer rotations, and DNS changes. Without a retry policy, every transient failure propagates directly to the caller. The .NET gRPC client supports built-in retry configuration, but it must be explicitly enabled and configured per service method.

Symptoms

RpcException: Status(StatusCode="Unavailable", Detail="Error connecting to subchannel")
gRPC call fails during server deployment or restart
No automatic retry on transient network errors
UNAVAILABLE status not retried despite being a retryable condition
Client channel does not reconnect after server comes back online

Error output: ``Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error starting gRPC call. HttpRequestException: Connection refused", DebugException="System.Net.Sockets.SocketException: Connection refused 10.0.0.1:5001")

Common Causes

No retry policy configured on the gRPC client
Retry not enabled for UNAVAILABLE status code
Deadline too short for retry attempts to complete
Client channel not reconnecting after connection drop
Load balancer dropping connections during health checks

Step-by-Step Fix

1.**Configure retry policy in gRPC client":
2.```csharp
3.using Grpc.Net.Client;
4.using Grpc.Core;

// Create a channel with retry configuration var channel = GrpcChannel.ForAddress("https://api.example.com", new GrpcChannelOptions { HttpClient = new HttpClient() });

// Configure retry using service config var serviceConfig = new ServiceConfig { MethodConfigs = { new MethodConfig { Names = { MethodName.Default }, // Apply to all methods RetryPolicy = new RetryPolicy { MaxAttempts = 4, InitialBackoff = TimeSpan.FromMilliseconds(100), MaxBackoff = TimeSpan.FromSeconds(1), BackoffMultiplier = 2, RetryableStatusCodes = { StatusCode.Unavailable, StatusCode.DeadlineExceeded, StatusCode.ResourceExhausted } } } } };

var channelWithRetry = GrpcChannel.ForAddress("https://api.example.com", new GrpcChannelOptions { HttpClient = new HttpClient(), ServiceConfig = serviceConfig });

var client = new MyService.MyServiceClient(channelWithRetry); ```

1.**Add Polly-based retry for more control":
2.```csharp
3.using Polly;
4.using Polly.Wrap;

var grpcRetryPolicy = Policy .Handle<RpcException>(ex => ex.StatusCode == StatusCode.Unavailable || ex.StatusCode == StatusCode.DeadlineExceeded) .WaitAndRetryAsync( retryCount: 3, sleepDurationProvider: retryAttempt => TimeSpan.FromMilliseconds(Math.Pow(2, retryAttempt) * 100), onRetry: (exception, timespan, retryCount, context) => { logger.LogWarning( "gRPC call failed (attempt {RetryCount}), retrying in {Delay}ms: {Message}", retryCount, timespan.TotalMilliseconds, exception.Message); });

// Use Polly to wrap gRPC calls var response = await grpcRetryPolicy.ExecuteAsync(async () => { return await client.GetDataAsync(new GetDataRequest { Id = id }); }); ```

1.**Set appropriate deadlines for retried calls":
2.```csharp
3.// WRONG - no deadline, call may hang forever
4.var response = await client.GetDataAsync(request);

// CORRECT - set deadline that accounts for retries var deadline = DateTime.UtcNow.AddSeconds(30); // 30 second total budget var response = await client.GetDataAsync( request, deadline: deadline);

// With metadata for tracing var headers = new Metadata { { "x-request-id", Guid.NewGuid().ToString() } }; var response = await client.GetDataAsync(request, headers, deadline: deadline); ```

Prevention

Configure retry policies for all production gRPC clients
Include UNAVAILABLE and DEADLINE_EXCEEDED in retryable status codes
Set deadlines that account for total retry time, not just single call
Use health checks to detect gRPC server availability before calling
Monitor gRPC call failure rates and retry counts in telemetry
Test retry behavior by killing and restarting the gRPC server during calls

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Prevention

People also search for

Share this guide

More .NET Troubleshooting Guides

Fix xUnit Theory Data Inline Fact Tests Skipped

Fix Serilog File Sink Rolling Interval Not Creating New Files

Fix Polly Bulkhead Policy Isolation Not Working

Fix MediatR Pipeline Behavior Not Executing in Order

Fix MassTransit Consumer Not Receiving Messages

Fix HttpClientFactory Named Client Not Resolving Configuration