Introduction
Google Cloud load balancers send traffic only to backends that pass the configured health checks. When a backend stays unhealthy, the service can lose capacity or fail entirely even though the VM, instance group, or NEG still exists. The root cause is usually a mismatch between the health check and the actual application listener, blocked probe traffic, or a backend service configuration that no longer matches the deployed workload.
Symptoms
- Google Cloud load balancer backends remain unhealthy in the backend service view.
- Users receive
502,503, or connection failures through the load balancer. - The application works from inside the VM or cluster but is not marked healthy externally.
- Health status changed after a port update, MIG rollout, firewall change, or backend service edit.
- Some zones or endpoints stay healthy while others fail consistently.
Common Causes
- The health check is probing the wrong port, path, or protocol.
- Firewall rules do not allow probe traffic from Google health check source ranges.
- The application listens only on localhost or on a different port than the backend service expects.
- Backend instances are overloaded, slow to start, or return non-success responses to health checks.
- Named ports, instance group settings, or NEG attachments do not match the deployed service.
- Recent rollout changes left part of the backend pool serving an outdated or broken configuration.
Step-by-Step Fix
- Open the backend service in Google Cloud and identify which backend instances, zones, or endpoints are marked unhealthy.
- Review the health check configuration in detail, including protocol, port selection, request path, host header, and thresholds.
- Confirm the application is listening on the expected interface and port. A service bound only to localhost or a mismatched port will fail external health checks.
- Test the health check endpoint directly from the backend host or workload and verify it returns the expected success response quickly.
- Inspect firewall rules to ensure Google Cloud health check source ranges can reach the backend on the required port.
- Check named ports, MIG configuration, or NEG attachments so the backend service points to the right serving port and target instances.
- Review application and system logs for startup delays, crashes, high latency, or dependency failures that could make the backend intermittently unhealthy.
- If the issue started after a rollout, compare healthy and unhealthy instances for differences in image version, startup scripts, service config, or attached network settings.
- After correcting the health check path, firewall rule, or listener configuration, wait for the backend to report healthy and confirm traffic begins flowing again.
- Finish by testing through the load balancer and checking backend health over several probe intervals to ensure the recovery is stable.