Introduction

gRPC uses HTTP/2 long-lived connections that are susceptible to transient failures: network blips, server restarts, load balancer rotations, and DNS changes. Without a retry policy, every transient failure propagates directly to the caller. The .NET gRPC client supports built-in retry configuration, but it must be explicitly enabled and configured per service method.

Symptoms

  • RpcException: Status(StatusCode="Unavailable", Detail="Error connecting to subchannel")
  • gRPC call fails during server deployment or restart
  • No automatic retry on transient network errors
  • UNAVAILABLE status not retried despite being a retryable condition
  • Client channel does not reconnect after server comes back online

Error output: `` Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error starting gRPC call. HttpRequestException: Connection refused", DebugException="System.Net.Sockets.SocketException: Connection refused 10.0.0.1:5001")

Common Causes

  • No retry policy configured on the gRPC client
  • Retry not enabled for UNAVAILABLE status code
  • Deadline too short for retry attempts to complete
  • Client channel not reconnecting after connection drop
  • Load balancer dropping connections during health checks

Step-by-Step Fix

  1. 1.**Configure retry policy in gRPC client":
  2. 2.```csharp
  3. 3.using Grpc.Net.Client;
  4. 4.using Grpc.Core;

// Create a channel with retry configuration var channel = GrpcChannel.ForAddress("https://api.example.com", new GrpcChannelOptions { HttpClient = new HttpClient() });

// Configure retry using service config var serviceConfig = new ServiceConfig { MethodConfigs = { new MethodConfig { Names = { MethodName.Default }, // Apply to all methods RetryPolicy = new RetryPolicy { MaxAttempts = 4, InitialBackoff = TimeSpan.FromMilliseconds(100), MaxBackoff = TimeSpan.FromSeconds(1), BackoffMultiplier = 2, RetryableStatusCodes = { StatusCode.Unavailable, StatusCode.DeadlineExceeded, StatusCode.ResourceExhausted } } } } };

var channelWithRetry = GrpcChannel.ForAddress("https://api.example.com", new GrpcChannelOptions { HttpClient = new HttpClient(), ServiceConfig = serviceConfig });

var client = new MyService.MyServiceClient(channelWithRetry); ```

  1. 1.**Add Polly-based retry for more control":
  2. 2.```csharp
  3. 3.using Polly;
  4. 4.using Polly.Wrap;

var grpcRetryPolicy = Policy .Handle<RpcException>(ex => ex.StatusCode == StatusCode.Unavailable || ex.StatusCode == StatusCode.DeadlineExceeded) .WaitAndRetryAsync( retryCount: 3, sleepDurationProvider: retryAttempt => TimeSpan.FromMilliseconds(Math.Pow(2, retryAttempt) * 100), onRetry: (exception, timespan, retryCount, context) => { logger.LogWarning( "gRPC call failed (attempt {RetryCount}), retrying in {Delay}ms: {Message}", retryCount, timespan.TotalMilliseconds, exception.Message); });

// Use Polly to wrap gRPC calls var response = await grpcRetryPolicy.ExecuteAsync(async () => { return await client.GetDataAsync(new GetDataRequest { Id = id }); }); ```

  1. 1.**Set appropriate deadlines for retried calls":
  2. 2.```csharp
  3. 3.// WRONG - no deadline, call may hang forever
  4. 4.var response = await client.GetDataAsync(request);

// CORRECT - set deadline that accounts for retries var deadline = DateTime.UtcNow.AddSeconds(30); // 30 second total budget var response = await client.GetDataAsync( request, deadline: deadline);

// With metadata for tracing var headers = new Metadata { { "x-request-id", Guid.NewGuid().ToString() } }; var response = await client.GetDataAsync(request, headers, deadline: deadline); ```

Prevention

  • Configure retry policies for all production gRPC clients
  • Include UNAVAILABLE and DEADLINE_EXCEEDED in retryable status codes
  • Set deadlines that account for total retry time, not just single call
  • Use health checks to detect gRPC server availability before calling
  • Monitor gRPC call failure rates and retry counts in telemetry
  • Test retry behavior by killing and restarting the gRPC server during calls