Introduction

Google Cloud load balancers send traffic only to backends that pass the configured health checks. When a backend stays unhealthy, the service can lose capacity or fail entirely even though the VM, instance group, or NEG still exists. The root cause is usually a mismatch between the health check and the actual application listener, blocked probe traffic, or a backend service configuration that no longer matches the deployed workload.

Symptoms

  • Google Cloud load balancer backends remain unhealthy in the backend service view.
  • Users receive 502, 503, or connection failures through the load balancer.
  • The application works from inside the VM or cluster but is not marked healthy externally.
  • Health status changed after a port update, MIG rollout, firewall change, or backend service edit.
  • Some zones or endpoints stay healthy while others fail consistently.

Common Causes

  • The health check is probing the wrong port, path, or protocol.
  • Firewall rules do not allow probe traffic from Google health check source ranges.
  • The application listens only on localhost or on a different port than the backend service expects.
  • Backend instances are overloaded, slow to start, or return non-success responses to health checks.
  • Named ports, instance group settings, or NEG attachments do not match the deployed service.
  • Recent rollout changes left part of the backend pool serving an outdated or broken configuration.

Step-by-Step Fix

  1. Open the backend service in Google Cloud and identify which backend instances, zones, or endpoints are marked unhealthy.
  2. Review the health check configuration in detail, including protocol, port selection, request path, host header, and thresholds.
  3. Confirm the application is listening on the expected interface and port. A service bound only to localhost or a mismatched port will fail external health checks.
  4. Test the health check endpoint directly from the backend host or workload and verify it returns the expected success response quickly.
  5. Inspect firewall rules to ensure Google Cloud health check source ranges can reach the backend on the required port.
  6. Check named ports, MIG configuration, or NEG attachments so the backend service points to the right serving port and target instances.
  7. Review application and system logs for startup delays, crashes, high latency, or dependency failures that could make the backend intermittently unhealthy.
  8. If the issue started after a rollout, compare healthy and unhealthy instances for differences in image version, startup scripts, service config, or attached network settings.
  9. After correcting the health check path, firewall rule, or listener configuration, wait for the backend to report healthy and confirm traffic begins flowing again.
  10. Finish by testing through the load balancer and checking backend health over several probe intervals to ensure the recovery is stable.