Introduction
gRPC uses HTTP/2 long-lived connections that are susceptible to transient failures: network blips, server restarts, load balancer rotations, and DNS changes. Without a retry policy, every transient failure propagates directly to the caller. The .NET gRPC client supports built-in retry configuration, but it must be explicitly enabled and configured per service method.
Symptoms
RpcException: Status(StatusCode="Unavailable", Detail="Error connecting to subchannel")- gRPC call fails during server deployment or restart
- No automatic retry on transient network errors
UNAVAILABLEstatus not retried despite being a retryable condition- Client channel does not reconnect after server comes back online
Error output:
``
Grpc.Core.RpcException: Status(StatusCode="Unavailable",
Detail="Error starting gRPC call. HttpRequestException:
Connection refused", DebugException="System.Net.Sockets.SocketException:
Connection refused 10.0.0.1:5001")
Common Causes
- No retry policy configured on the gRPC client
- Retry not enabled for
UNAVAILABLEstatus code - Deadline too short for retry attempts to complete
- Client channel not reconnecting after connection drop
- Load balancer dropping connections during health checks
Step-by-Step Fix
- 1.**Configure retry policy in gRPC client":
- 2.```csharp
- 3.using Grpc.Net.Client;
- 4.using Grpc.Core;
// Create a channel with retry configuration var channel = GrpcChannel.ForAddress("https://api.example.com", new GrpcChannelOptions { HttpClient = new HttpClient() });
// Configure retry using service config var serviceConfig = new ServiceConfig { MethodConfigs = { new MethodConfig { Names = { MethodName.Default }, // Apply to all methods RetryPolicy = new RetryPolicy { MaxAttempts = 4, InitialBackoff = TimeSpan.FromMilliseconds(100), MaxBackoff = TimeSpan.FromSeconds(1), BackoffMultiplier = 2, RetryableStatusCodes = { StatusCode.Unavailable, StatusCode.DeadlineExceeded, StatusCode.ResourceExhausted } } } } };
var channelWithRetry = GrpcChannel.ForAddress("https://api.example.com", new GrpcChannelOptions { HttpClient = new HttpClient(), ServiceConfig = serviceConfig });
var client = new MyService.MyServiceClient(channelWithRetry); ```
- 1.**Add Polly-based retry for more control":
- 2.```csharp
- 3.using Polly;
- 4.using Polly.Wrap;
var grpcRetryPolicy = Policy .Handle<RpcException>(ex => ex.StatusCode == StatusCode.Unavailable || ex.StatusCode == StatusCode.DeadlineExceeded) .WaitAndRetryAsync( retryCount: 3, sleepDurationProvider: retryAttempt => TimeSpan.FromMilliseconds(Math.Pow(2, retryAttempt) * 100), onRetry: (exception, timespan, retryCount, context) => { logger.LogWarning( "gRPC call failed (attempt {RetryCount}), retrying in {Delay}ms: {Message}", retryCount, timespan.TotalMilliseconds, exception.Message); });
// Use Polly to wrap gRPC calls var response = await grpcRetryPolicy.ExecuteAsync(async () => { return await client.GetDataAsync(new GetDataRequest { Id = id }); }); ```
- 1.**Set appropriate deadlines for retried calls":
- 2.```csharp
- 3.// WRONG - no deadline, call may hang forever
- 4.var response = await client.GetDataAsync(request);
// CORRECT - set deadline that accounts for retries var deadline = DateTime.UtcNow.AddSeconds(30); // 30 second total budget var response = await client.GetDataAsync( request, deadline: deadline);
// With metadata for tracing var headers = new Metadata { { "x-request-id", Guid.NewGuid().ToString() } }; var response = await client.GetDataAsync(request, headers, deadline: deadline); ```
Prevention
- Configure retry policies for all production gRPC clients
- Include
UNAVAILABLEandDEADLINE_EXCEEDEDin retryable status codes - Set deadlines that account for total retry time, not just single call
- Use health checks to detect gRPC server availability before calling
- Monitor gRPC call failure rates and retry counts in telemetry
- Test retry behavior by killing and restarting the gRPC server during calls