Introduction

HAProxy health checks are critical for ensuring traffic only flows to healthy backend servers. When health checks fail incorrectly, healthy servers are removed from the pool causing unnecessary capacity reduction or complete outages. Health check failures can result from endpoint misconfiguration, timing issues, response mismatches, or network problems.

Symptoms

Error messages in HAProxy logs:

bash
Health check for server web_servers/web1 failed
Check: tcp://10.0.0.1:8080 result: Layer7 wrong status
Check: tcp://10.0.0.1:8080 result: Layer4 timeout
http-check expect status 200 but got 404

Observable indicators: - Servers marked DOWN despite serving traffic correctly - Intermittent flapping between UP and DOWN states - Specific check_status values: L4TOUT, L7TOUT, L7STS, L7OK - Manual curl to backend succeeds but health check fails

Common Causes

  1. 1.Wrong health check endpoint - Checking /health but app uses /healthz
  2. 2.Incorrect expected response - Expecting 200 but app returns 201
  3. 3.Host header missing - Application requires specific Host header
  4. 4.Timeout too short - Health check endpoint slower than configured timeout
  5. 5.Check interval too aggressive - Backend overwhelmed by health checks
  6. 6.SSL/TLS mismatch - HTTP check on HTTPS endpoint or vice versa
  7. 7.Port mismatch - Checking wrong port for health endpoint

Step-by-Step Fix

Step 1: Identify Health Check Status

```bash # Check server status echo "show stat" | socat stdio /var/run/haproxy.sock | grep -E "pxname|web"

# Look for check_status and last_chk echo "show stat" | socat stdio /var/run/haproxy.sock

# Check detailed server info echo "show servers state" | socat stdio /var/run/haproxy.sock ```

Step 2: Manually Test Health Endpoint

```bash # Test exactly what HAProxy is checking curl -v http://10.0.0.1:8080/health

# Test with Host header if required curl -v -H "Host: example.com" http://10.0.0.1:8080/health

# Test with HTTPS if needed curl -vk https://10.0.0.1:8443/health

# Check response timing curl -w "Time: %{time_total}s\n" http://10.0.0.1:8080/health ```

Step 3: Review Health Check Configuration

```haproxy backend web_servers balance roundrobin

# Option 1: Simple TCP check server web1 10.0.0.1:8080 check

# Option 2: HTTP GET check option httpchk GET /health http-check expect status 200 server web1 10.0.0.1:8080 check

# Option 3: Full HTTP check with headers option httpchk GET /health HTTP/1.1\r\nHost:\ example.com\r\nConnection:\ close http-check expect status 200 server web1 10.0.0.1:8080 check ```

Step 4: Fix Common Configuration Issues

Fix: Wrong Endpoint ``haproxy backend web_servers # Change from /health to correct endpoint option httpchk GET /healthz HTTP/1.1\r\nHost:\ example.com http-check expect status 200

Fix: Multiple Acceptable Status Codes ``haproxy backend web_servers option httpchk GET /health # Accept 200-299 status codes http-check expect status 200-399 # Or accept specific codes http-check expect status 200 201 202

Fix: Response Body Check ``haproxy backend web_servers option httpchk GET /health # Check for specific string in response body http-check expect string "healthy" # Or use regex http-check expect rstatus "^[23][0-9]{2}$"

Fix: Timeout Settings ``haprorix backend web_servers option httpchk GET /health http-check expect status 200 # Increase check timeout for slow endpoints server web1 10.0.0.1:8080 check inter 10s fall 3 rise 2 timeout check 5s

Step 5: Fix SSL Health Checks

```haproxy backend web_servers balance roundrobin

# SSL health check with SNI server web1 10.0.0.1:8443 check ssl verify required ca-file /etc/ssl/certs/ca.crt sni str(backend.example.com)

# Or use HTTP check on SSL port option httpchk GET /health server web1 10.0.0.1:8443 check-ssl ssl verify none ```

Step 6: Verify the Fix

```bash # Reload configuration haproxy -c -f /etc/haproxy/haproxy.cfg systemctl reload haproxy

# Watch health check status watch -n 1 'echo "show stat" | socat stdio /var/run/haproxy.sock | grep web'

# Monitor logs tail -f /var/log/haproxy.log | grep -i health ```

Advanced Diagnosis

Debug Health Check Process

```bash # Run HAProxy in debug mode haproxy -d -f /etc/haproxy/haproxy.cfg

# Check agent health if configured echo "show sess" | socat stdio /var/run/haproxy.sock ```

External Health Check Script

haproxy
backend web_servers
    # Use external check script
    option external-check
    external-check path "/usr/local/bin:/bin:/sbin"
    external-check command /usr/local/bin/health-check.sh
    server web1 10.0.0.1:8080 check
bash
#!/bin/bash
# /usr/local/bin/health-check.sh
# $1 = address, $2 = port
curl -sf http://$1:$2/health && exit 0 || exit 1

MySQL Health Check Example

haproxy
backend mysql_servers
    mode tcp
    option mysql-check user haproxy_check
    server mysql1 10.0.0.1:3306 check

Redis Health Check Example

haproxy
backend redis_servers
    mode tcp
    option redis-check
    server redis1 10.0.0.1:6379 check

Common Pitfalls

  • Redirect loops - Health endpoint returns 301/302 instead of 200
  • Authentication required - Endpoint needs auth but check doesn't provide it
  • IPv6 vs IPv4 - Health check using wrong IP family
  • Wrong check port - check-port directive overriding server port
  • Maintenance mode - Server administratively disabled
  • Agent check conflicts - Agent and HTTP check giving different results

Best Practices

```haproxy defaults # Global health check defaults timeout check 5s default-server inter 5s fall 3 rise 2

backend web_servers balance roundrobin

# Dedicated health endpoint option httpchk GET /health HTTP/1.1\r\nHost:\ example.com\r\nConnection:\ close http-check expect status 200-399

# Appropriate intervals based on app response time default-server inter 10s fall 3 rise 2

# Multiple servers for redundancy server web1 10.0.0.1:8080 check server web2 10.0.0.2:8080 check

# Enable observe for passive health checks server web1 10.0.0.1:8080 check observe layer7 error-limit 5 on-error mark-down ```

  • HAProxy Backend Down
  • HAProxy Connection Refused
  • HAProxy SSL Handshake Failed
  • AWS ALB Health Check Failing