Introduction

A failing load balancer health check can take a healthy application out of service or keep an unhealthy one in rotation. The application may appear fine when tested manually, yet the balancer still marks instances down because it expects a different path, status code, host header, or TLS behavior. The fix is to compare exactly what the probe sends with what the backend is prepared to return during startup and steady state.

Symptoms

  • Backends are marked unhealthy even though manual page checks succeed
  • Traffic drains from new instances right after deploy or autoscaling events
  • The site flaps between healthy and unavailable as nodes enter and leave rotation
  • Health checks started failing after changing ports, redirects, auth, or TLS settings
  • The balancer reports probe timeouts, wrong status codes, or handshake failures

Common Causes

  • The health check path redirects, requires auth, or depends on services that are not ready yet
  • The balancer expects a specific host header, port, or status code that the backend no longer serves
  • TLS settings differ between probe traffic and normal client traffic
  • Startup time, database readiness, or warm-up behavior exceeds the configured health threshold
  • A firewall or security layer blocks the load balancer probe source or user agent

Step-by-Step Fix

  1. Confirm the exact health check path, method, headers, port, and success criteria configured on the load balancer.
  2. Test that probe request directly against the backend instead of assuming a browser check covers the same behavior.
  3. Review whether the health endpoint redirects, enforces auth, or depends on downstream systems that may not be ready during startup.
  4. Verify host header, protocol, and TLS expectations match what the application listener is configured to accept.
  5. Check startup and readiness timing so fresh instances are not judged unhealthy before they finish warming up.
  6. Inspect balancer logs, backend logs, and firewall events for blocked probes, handshake issues, or response mismatches.
  7. Narrowly adjust the health endpoint or balancer thresholds so the check reflects true application readiness rather than incidental dependencies.
  8. Retest during rollout or instance replacement, not just on a stable node, to make sure the fix survives real transitions.
  9. Keep health check contracts documented with the service so infrastructure changes do not silently invalidate them later.