Home / Load Balancer / Load Balancer Health Check False Failure - Healthy Backend Marked Down

Load Balancer

Load Balancer Health Check False Failure - Healthy Backend Marked Down

Load balancer incorrectly marks healthy backends as unhealthy due to aggressive health check configuration.

Yesterday2 min read

Abstract illustration for load balancer troubleshooting guides.

Introduction When load balancer health checks incorrectly mark healthy backends as down, traffic is unnecessarily removed from those backends. This reduces capacity and can cause cascading failures.

Symptoms - Backend flapping between healthy and unhealthy - Healthy backend serving traffic then suddenly removed - Health check logs showing intermittent timeouts - Backend marked unhealthy during deployment rollout - Only some health check intervals failing

Common Causes - Health check timeout too short for backend response - Check interval too frequent causing backend overload - Health check endpoint depends on external services - Backend garbage collection causing pause - Health check during backend restart or deploy

Step-by-Step Fix 1. **Check current health check configuration': ```bash # AWS ALB aws elbv2 describe-target-groups --query "TargetGroups[*].{Name:TargetGroupName,Interval:HealthCheckIntervalSeconds,Timeout:HealthCheckTimeoutSeconds,Threshold:HealthyThresholdCount}" # HAProxy grep -E "inter|fall|rise" /etc/haproxy/haproxy.cfg ```

1.**Adjust health check parameters':
2.```yaml
3.# Recommended settings
4.interval: 30s # Not too frequent
5.timeout: 10s # Allow for slow responses
6.healthy_threshold: 2 # Not too many
7.unhealthy_threshold: 3 # Not too sensitive
8.`
9.**Use a lightweight health check endpoint':
10.```python
11.@app.route('/health')
12.def health():
13.return {"status": "ok"}, 200 # No DB checks, no external calls
14.`

Prevention - Use separate readiness and liveness endpoints - Set health check timeout to 3x p99 response time - Monitor health check failure rate per backend - Implement graceful shutdown with deregistration delay - Use startup probes for slow-starting backends