Fix API Load Balancer Health Check Passing but Backend Degraded

Introduction

Load balancers use health check endpoints to determine whether backend instances should receive traffic. A superficial health check (e.g., just returning HTTP 200) may pass even when the backend is degraded -- unable to process requests, database connections exhausted, or dependent services unreachable. This causes the load balancer to continue routing traffic to unhealthy backends, resulting in client-facing errors.

Symptoms

Load balancer shows all backends as healthy but clients receive 500 errors
Health check endpoint returns 200 OK while the main API returns errors
Gradual increase in error rate as more backends degrade
Removing and re-adding a backend to the pool temporarily improves performance
Error message: No load balancer error -- clients see 500/502 from the degraded backend

Common Causes

Health check endpoint only verifies the process is running, not service functionality
Health check not testing critical dependencies (database, cache, message queue)
Health check interval too long, not detecting degradation quickly enough
Health check threshold (unhealthy count) too high, tolerating too many failures
Backend degrading slowly (memory leak, connection pool exhaustion) below the health check radar

Step-by-Step Fix

1.Check the current health check configuration: See what is being verified.
2.```bash
3.# AWS ALB example
4.aws elbv2 describe-target-health \
5.--target-group-arn $TG_ARN
6.# Check health check path, interval, and thresholds

# Check what the health endpoint actually does curl -v https://backend.example.com/health ```

1.Implement a comprehensive health check endpoint: Verify all dependencies.
2.```python
3.# Python FastAPI health check
4.@app.get("/health")
5.async def health_check():
6.checks = {}
7.# Database
8.try:
9.await db.execute("SELECT 1")
10.checks["database"] = "healthy"
11.except Exception as e:
12.checks["database"] = f"unhealthy: {str(e)}"

# Redis cache try: await redis.ping() checks["cache"] = "healthy" except Exception as e: checks["cache"] = f"unhealthy: {str(e)}"

# Return 200 only if all critical dependencies are healthy status = all(v == "healthy" for v in checks.values()) return JSONResponse( status_code=200 if status else 503, content={"status": "healthy" if status else "degraded", "checks": checks} ) ```

jq '.TargetHealthDescriptions[]

Prevention

Implement health checks that verify all critical dependencies, not just process liveness
Use separate readiness and liveness probes -- readiness for load balancer, liveness for restart
Set health check intervals to 10 seconds or less for rapid degradation detection
Monitor the gap between health check status and actual error rates
Implement graceful degradation that returns 503 from health checks when non-critical dependencies fail
Test health check behavior by simulating dependency failures in staging environments

API Load Balancer Health Check Passing but Backend Degraded

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Prevention

Share this guide

More API Troubleshooting Guides

API XML Validation Failed

API JSON Parse Error

API Base64 Encoding Invalid

API Query String Invalid

API URL Too Long

API Header Size Exceeded