Fix HAProxy Backend Server Down Not Recovering

What's Actually Happening

Your HAProxy load balancer shows backend servers as DOWN in the statistics page, but the actual backend servers are running and healthy. Traffic isn't being distributed to these servers, causing load imbalance or service unavailability. Even after restarting HAProxy or the backend servers, they remain marked as DOWN and won't transition to UP state.

This typically indicates health check failures - HAProxy's health checks are failing to get successful responses from your backend servers, even though the servers are actually working. The issue could be misconfigured health checks, network problems, firewall rules, or backend server configuration issues.

The Error You'll See

HAProxy backend server issues manifest in various ways:

```bash # HAProxy stats showing DOWN servers # Via stats page or socket: $ echo "show stat" | socat stdio /var/run/haproxy.sock # pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,ls_rsp_cnt,last_chk,last_agt,qtime,ctime,rtime,ttime, # web-backend,web1,0,0,0,5,,0,0,0,,0,,0,0,0,0,0,DOWN,1,1,0,15,3,3600,1800,,1,1,1,0,0,,1,0,0,5,L4CON,,0,0,0,0,0,0,0,0,,,,0,0,0,0,0,0,0,0,Connection refused,

# Server showing L4CON (Layer 4 connection failure) # backend_servers,server1,0,0,0,0,,0,0,0,,0,,0,0,0,0,0,DOWN,100,1,0,120,5,7200,3600,,1,2,1,0,0,,1,0,,1,L4CON,,1,0,0,0,0,0,0,0,,,,,,,0,0,,,,,Connection refused,

# Server showing L7RSP (Layer 7 bad response) # api-backend,api1,0,0,0,5,,10,50000,45000,,0,,0,0,0,0,0,DOWN,1,1,0,50,2,1800,900,,1,3,1,0,0,,1,0,0,10,L7RSP,503,5,0,0,0,0,50,0,0,,,,0,0,0,0,0,0,0,0,HTTP status 503 returned,

# Server in MAINT mode # web-backend,web2,0,0,0,0,,0,0,0,,0,,0,0,0,0,0,MAINT,1,0,0,0,0,0,0,,1,1,2,0,0,,1,0,0,0,0,0,0,0,0,0,0,,,,0,0,0,0,0,0,0,0,,

# Health check timeouts # backend,server,0,0,0,0,,0,0,0,,0,,0,0,0,0,0,DOWN,1,1,0,200,10,86400,43200,,1,4,1,0,0,,1,0,0,0,L4TOUT,,2000,0,0,0,0,0,0,0,0,,,,0,0,0,0,0,0,0,0,Connection timed out,

# HAProxy logs showing health check failures $ tail -f /var/log/haproxy.log Apr 8 10:15:23 haproxy[12345]: Health check for server web-backend/web1 failed, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms, status: 0/3 UP. Apr 8 10:15:24 haproxy[12345]: Health check for server api-backend/api1 failed, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 5ms, status: 1/3 UP. Apr 8 10:15:25 haproxy[12345]: Server web-backend/web2 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

# Checking via HAProxy socket $ echo "show servers state" | socat stdio /var/run/haproxy.sock # be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state srv_uweight srv_iweight srv_time_since_last_change srv_check_status srv_check_result 1 web-backend 1 web1 192.168.1.10 DOWN 0 1 1 3600 1 0 2 api-backend 1 api1 192.168.1.20 DOWN 0 1 1 1800 2 0

# Unable to reach backend $ curl -H "Host: example.com" http://haproxy-server/ <html><body><h1>503 Service Unavailable</h1> No server is available to handle this request. </body></html> ```

Additional symptoms: - All traffic going to subset of servers - HAProxy stats page shows servers as DOWN - Health check failures in logs - Backend servers are actually running and responding - Manual curl to backend server works - Servers stuck in transition state - "No server is available" errors to clients

Why This Happens

1.Health Check Configuration Mismatch: The health check is configured to check wrong port, wrong URI path, or expects different HTTP status code than what backend returns. HAProxy marks server DOWN because health check doesn't get expected response.
2.Health Check Firewall Blocked: HAProxy's health check packets are blocked by firewall on either HAProxy server or backend server. The backend is healthy but HAProxy can't reach it for health checks due to iptables, security groups, or network ACLs.
3.Wrong Backend IP or Port: Backend server IP address or port changed but HAProxy configuration wasn't updated. HAProxy tries to connect to wrong address.
4.Backend Server Not Listening on Expected Port: The application on backend server isn't listening on the port HAProxy expects, or is listening on different interface (localhost vs 0.0.0.0).
5.Health Check Interval Too Aggressive: Health checks are too frequent or timeout too short for backend to respond. Backend takes longer than configured inter or timeout values.
6.Backend Returning Wrong HTTP Status: Health check expects HTTP 200 but backend returns 301 redirect, 403 forbidden, or 503 unavailable. HAProxy interprets this as unhealthy.
7.Server in Maintenance Mode: Server was put into MAINT state manually or via agent, and wasn't returned to ready state. HAProxy won't route traffic to MAINT servers.
8.Rise/Fall Threshold Not Met: Health check needs multiple successes (rise) to mark server UP or multiple failures (fall) to mark DOWN. Server is transitioning between states.

Step 1: Identify Which Backend Servers Are DOWN

First, check the current state of all backend servers in HAProxy.

```bash # Check HAProxy stats via socket echo "show stat" | socat stdio /var/run/haproxy.sock | column -t -s,

# Check servers state echo "show servers state" | socat stdio /var/run/haproxy.sock

# Check specific backend echo "show stat" | socat stdio /var/run/haproxy.sock | grep "web-backend"

# If socket doesn't exist, check stats page curl -s http://localhost:8404/stats | grep -A5 "web-backend"

# Or enable stats page in config: cat >> /etc/haproxy/haproxy.cfg << 'EOF' listen stats bind *:8404 stats enable stats uri /stats stats refresh 10s stats auth admin:password EOF

# Check HAProxy logs for health check failures tail -100 /var/log/haproxy.log | grep -i "health check|failed|DOWN"

# Check recent state changes tail -100 /var/log/haproxy.log | grep -i "changed|UP|DOWN"

# View detailed server info echo "show servers state" | socat stdio /var/run/haproxy.sock

# Parse to show only DOWN servers echo "show servers state" | socat stdio /var/run/haproxy.sock | awk '$6 == "DOWN" {print $2, $4, $5}'

# Check errors echo "show errors" | socat stdio /var/run/haproxy.sock

# Check current sessions echo "show sess" | socat stdio /var/run/haproxy.sock

# View HAProxy configuration for backend grep -A20 "backend web-backend" /etc/haproxy/haproxy.cfg ```

Step 2: Verify Backend Server Is Actually Running

Confirm the backend server is healthy and responding.

```bash # From HAProxy server, test direct connection to backend: # Get backend server IP from config: grep -A20 "backend web-backend" /etc/haproxy/haproxy.cfg | grep server

# Test TCP connectivity to backend: nc -zv 192.168.1.10 80 # Should show: succeeded!

# Test HTTP connectivity: curl -I http://192.168.1.10/

# Test with Host header (if needed): curl -I -H "Host: example.com" http://192.168.1.10/

# Test health check endpoint specifically: curl -v http://192.168.1.10/health curl -v http://192.168.1.10/healthz curl -v http://192.168.1.10/status

# Check what HTTP status is returned: curl -s -o /dev/null -w "%{http_code}" http://192.168.1.10/health

# Test from backend server itself: ssh 192.168.1.10 curl -I http://localhost:80/health

# Check if service is running on backend: ssh 192.168.1.10 'systemctl status nginx' ssh 192.168.1.10 'systemctl status apache2' ssh 192.168.1.10 'netstat -tlnp | grep :80'

# Check backend server logs: ssh 192.168.1.10 'tail -50 /var/log/nginx/error.log' ssh 192.168.1.10 'tail -50 /var/log/apache2/error.log'

# Check if backend is listening on correct interface: ssh 192.168.1.10 'ss -tlnp | grep :80' # Should show 0.0.0.0:80 or :::80, not 127.0.0.1:80 ```

Step 3: Examine Health Check Configuration

Review the health check settings in HAProxy configuration.

```bash # View backend configuration cat /etc/haproxy/haproxy.cfg | grep -A30 "backend web-backend"

# Typical backend with health check: # backend web-backend # balance roundrobin # option httpchk GET /health HTTP/1.1\r\nHost:\ example.com # http-check expect status 200 # server web1 192.168.1.10:80 check inter 5s fall 3 rise 2 # server web2 192.168.1.11:80 check inter 5s fall 3 rise 2

# Health check parameters: # - check: Enable health checking # - inter: Interval between checks (default 2s) # - fall: Number of failures to mark DOWN (default 3) # - rise: Number of successes to mark UP (default 2) # - timeout: Check timeout (default 2000ms)

# Check if health check path exists: curl -I http://192.168.1.10/health # Does it return 200 OK?

# If using HTTP/1.1 with Host header, test exactly: curl --http1.1 -H "Host: example.com" http://192.168.1.10/health

# Check for common misconfigurations: # 1. Wrong port: server web1 192.168.1.10:8080 check # But app listening on 80

# 2. Wrong check path: option httpchk GET /healthz # But endpoint is /health

# 3. Wrong expected status: http-check expect status 200-399 # But backend returns 404

# 4. SSL mismatch: Server uses HTTPS but check uses HTTP

# For SSL backends, check configuration: # server web1 192.168.1.10:443 check ssl verify none # Or: # option httpchk GET /health # server web1 192.168.1.10:443 check check-ssl

# Validate HAProxy config: haproxy -c -f /etc/haproxy/haproxy.cfg ```

Step 4: Check Network Connectivity and Firewalls

Verify HAProxy can reach backend servers for health checks.

```bash # Test basic connectivity from HAProxy server: ping -c 3 192.168.1.10

# Test TCP connection: nc -zv 192.168.1.10 80 telnet 192.168.1.10 80

# Test with curl showing all details: curl -v --connect-timeout 5 http://192.168.1.10/health

# Check firewall on HAProxy server: sudo iptables -L -n -v | grep 192.168.1.10 sudo iptables -L OUTPUT -n -v

# Check if traffic is being blocked: sudo iptables -L INPUT -n -v sudo iptables -L FORWARD -n -v

# Check firewall on backend server: ssh 192.168.1.10 'sudo iptables -L -n -v | grep -E "80|443"'

# For firewalld (CentOS/RHEL): ssh 192.168.1.10 'sudo firewall-cmd --list-all'

# Check if port is allowed: ssh 192.168.1.10 'sudo firewall-cmd --list-ports'

# Add port if missing: ssh 192.168.1.10 'sudo firewall-cmd --add-port=80/tcp --permanent' ssh 192.168.1.10 'sudo firewall-cmd --reload'

# For UFW (Ubuntu): ssh 192.168.1.10 'sudo ufw status' ssh 192.168.1.10 'sudo ufw allow 80/tcp'

# Check security groups (AWS): aws ec2 describe-security-groups --group-ids sg-xxx --query 'SecurityGroups[0].IpPermissions'

# Check network ACLs (AWS): aws ec2 describe-network-acls

# Check if there are any network issues: traceroute 192.168.1.10 mtr -r -c 10 192.168.1.10

# Check for MTU issues: ping -s 1472 -M do 192.168.1.10 # If fails, MTU issue

# Test health check from HAProxy's perspective: # Run manual health check similar to HAProxy: curl --connect-timeout 2 --max-time 5 -H "Host: example.com" http://192.168.1.10/health ```

Step 5: Manually Test HAProxy Health Check

Simulate HAProxy's health check exactly.

```bash # Check what health check HAProxy is configured to do: grep -A20 "backend web-backend" /etc/haproxy/haproxy.cfg | grep -E "option httpchk|http-check|server.*check"

# Typical HTTP check: # option httpchk GET /health HTTP/1.1\r\nHost:\ example.com

# Test this exact request: echo -e "GET /health HTTP/1.1\r\nHost: example.com\r\nConnection: close\r\n\r\n" | nc 192.168.1.10 80

# Or with curl: curl -v --http1.1 -H "Host: example.com" http://192.168.1.10/health

# For HTTPS backend: echo -e "GET /health HTTP/1.1\r\nHost: example.com\r\nConnection: close\r\n\r\n" | openssl s_client -connect 192.168.1.10:443

# Check expected status code: grep "http-check expect" /etc/haproxy/haproxy.cfg

# If expects 200, test: curl -s -o /dev/null -w "%{http_code}" http://192.168.1.10/health # Should return 200

# If expects 200-399, make sure response is in range

# For TCP checks (no HTTP): # option tcp-check # tcp-check connect

# Test TCP connection: nc -zv 192.168.1.10 80

# For MySQL check: # option mysql-check user haproxy_check # Test: mysql -u haproxy_check -h 192.168.1.10 -e "SELECT 1"

# For Redis check: # option redis-check # Test: redis-cli -h 192.168.1.10 PING

# Check inter, fall, rise values: grep "server.*check" /etc/haproxy/haproxy.cfg | head -3 # inter 5s = check every 5 seconds # fall 3 = 3 failures to mark DOWN # rise 2 = 2 successes to mark UP

# Wait for health check interval + retries: sleep 15 echo "show stat" | socat stdio /var/run/haproxy.sock | grep web1 ```

Step 6: Fix Health Check Configuration

Update HAProxy configuration to fix health check issues.

```bash # Edit HAProxy configuration: nano /etc/haproxy/haproxy.cfg

# Option 1: Use simpler health check backend web-backend balance roundrobin # Simple TCP check (just verify port is open) server web1 192.168.1.10:80 check server web2 192.168.1.11:80 check

# Option 2: HTTP check on specific path backend web-backend balance roundrobin option httpchk GET /health server web1 192.168.1.10:80 check server web2 192.168.1.11:80 check

# Option 3: HTTP check with Host header backend web-backend balance roundrobin option httpchk GET /health HTTP/1.1\r\nHost:\ example.com http-check expect status 200 server web1 192.168.1.10:80 check inter 5s fall 3 rise 2 server web2 192.168.1.11:80 check inter 5s fall 3 rise 2

# Option 4: HTTP check accepting multiple status codes backend web-backend balance roundrobin option httpchk GET /health http-check expect status 200-399 server web1 192.168.1.10:80 check

# Option 5: Adjust timing for slow backends backend web-backend balance roundrobin option httpchk GET /health server web1 192.168.1.10:80 check inter 10s fall 5 rise 3 timeout 5000

# Option 6: For HTTPS backends backend web-backend balance roundrobin option httpchk GET /health server web1 192.168.1.10:443 check check-ssl ssl verify none

# Option 7: Disable health check temporarily backend web-backend balance roundrobin server web1 192.168.1.10:80 # No 'check' keyword

# Validate configuration: haproxy -c -f /etc/haproxy/haproxy.cfg

# Reload HAProxy: systemctl reload haproxy # Or: systemctl restart haproxy

# Check status after reload: systemctl status haproxy tail -20 /var/log/haproxy.log ```

Step 7: Force Server State via Socket

Manually set server state using HAProxy socket.

```bash # Enable server that is DOWN: echo "set server web-backend/web1 state ready" | socat stdio /var/run/haproxy.sock

# Check server status: echo "show stat" | socat stdio /var/run/haproxy.sock | grep web1

# Set server to drain mode: echo "set server web-backend/web1 state drain" | socat stdio /var/run/haproxy.sock

# Set server to maintenance: echo "set server web-backend/web1 state maint" | socat stdio /var/run/haproxy.sock

# Enable all servers in a backend: for server in web1 web2 web3; do echo "set server web-backend/$server state ready" | socat stdio /var/run/haproxy.sock done

# Set server weight: echo "set weight web-backend/web1 100" | socat stdio /var/run/haproxy.sock

# Check agent status: echo "show servers state" | socat stdio /var/run/haproxy.sock

# Disable health check temporarily: echo "disable health web-backend/web1" | socat stdio /var/run/haproxy.sock

# Re-enable health check: echo "enable health web-backend/web1" | socat stdio /var/run/haproxy.sock

# Get detailed server info: echo "show servers state web-backend" | socat stdio /var/run/haproxy.sock

# Force server address (if IP changed): echo "set server web-backend/web1 addr 192.168.1.20 port 8080" | socat stdio /var/run/haproxy.sock

# Clear server errors: echo "clear table web-backend" | socat stdio /var/run/haproxy.sock ```

Step 8: Fix Backend Server Configuration

Ensure backend server is configured correctly to respond to health checks.

```bash # SSH to backend server: ssh 192.168.1.10

# Check if service is running: systemctl status nginx systemctl status apache2 systemctl status your-app

# Check listening ports: ss -tlnp | grep -E ":80|:443" netstat -tlnp | grep -E ":80|:443"

# Ensure listening on 0.0.0.0 (not just 127.0.0.1): # Nginx: cat /etc/nginx/nginx.conf | grep -A3 "listen" # Should be: listen 80; or listen 0.0.0.0:80; # Not: listen 127.0.0.1:80;

# Apache: cat /etc/apache2/ports.conf | grep Listen # Should be: Listen 80 # Not: Listen 127.0.0.1:80

# Create health check endpoint if missing: # Nginx: cat >> /etc/nginx/conf.d/default.conf << 'EOF' location /health { access_log off; return 200 "healthy\n"; add_header Content-Type text/plain; } EOF nginx -t && systemctl reload nginx

# Apache: cat >> /var/www/html/.htaccess << 'EOF' RewriteEngine On RewriteRule ^health$ - [R=200,L] EOF

# Application-level health endpoint (example for Node.js): cat >> app.js << 'EOF' app.get('/health', (req, res) => { res.status(200).send('healthy'); }); EOF

# Test health endpoint: curl -I http://localhost/health # Should return 200 OK

# Check firewall allows health check traffic: iptables -L INPUT -n | grep 80 ufw status | grep 80

# Allow port if needed: ufw allow 80/tcp iptables -I INPUT -p tcp --dport 80 -j ACCEPT

# Check SELinux if enabled: getenforce # If Enforcing, check for blocks: ausearch -m AVC -ts recent | grep httpd

# Test from HAProxy server again: exit curl -I http://192.168.1.10/health ```

Step 9: Implement Proper Health Check Endpoint

Create a robust health check endpoint on backend servers.

```bash # For Nginx backends - create dedicated health location: ssh 192.168.1.10 << 'EOF' cat > /etc/nginx/conf.d/health.conf << 'NGINX' server { listen 80 default_server;

location /health { access_log off; return 200 "healthy\n"; add_header Content-Type text/plain; }

location / { # Your normal config proxy_pass http://localhost:3000; } } NGINX

nginx -t && systemctl reload nginx EOF

# For application-level health checks:

# Node.js Express: cat >> server.js << 'EOF' app.get('/health', (req, res) => { // Check dependencies const health = { status: 'ok', timestamp: new Date().toISOString(), checks: { database: 'ok', cache: 'ok' } }; res.status(200).json(health); }); EOF

# Python Flask: cat >> app.py << 'EOF' @app.route('/health') def health(): return jsonify(status='healthy'), 200 EOF

# Python Django - add to urls.py: # path('health/', lambda r: HttpResponse('healthy')),

# Java Spring Boot - actuator is built-in: # /actuator/health endpoint

# Go: # Add simple health handler: cat >> main.go << 'EOF' http.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) { w.WriteHeader(200) w.Write([]byte("healthy")) }) EOF

# Update HAProxy to use proper health check: cat >> /etc/haproxy/haproxy.cfg << 'EOF' backend web-backend balance roundrobin option httpchk GET /health http-check expect status 200 default-server inter 5s fall 3 rise 2 server web1 192.168.1.10:80 check server web2 192.168.1.11:80 check EOF

systemctl reload haproxy ```

Step 10: Set Up Monitoring for Backend Health

Create monitoring to detect and alert on backend failures.

```bash # Create HAProxy monitoring script: cat > /usr/local/bin/check-haproxy-backends.sh << 'EOF' #!/bin/bash # HAProxy Backend Health Check

HAPROXY_SOCK="/var/run/haproxy.sock" ALERT_EMAIL="ops@company.com" LOG_FILE="/var/log/haproxy-monitor.log"

# Get list of DOWN servers DOWN_SERVERS=$(echo "show stat" | socat stdio $HAPROXY_SOCK 2>/dev/null | \ awk -F, 'NR>1 && $18=="DOWN" {print $1","$2}')

if [ -n "$DOWN_SERVERS" ]; then echo "$(date): DOWN servers detected: $DOWN_SERVERS" >> $LOG_FILE

# Count DOWN servers DOWN_COUNT=$(echo "$DOWN_SERVERS" | wc -l)

# Alert if more than half are down TOTAL=$(echo "show stat" | socat stdio $HAPROXY_SOCK 2>/dev/null | \ awk -F, 'NR>1 && $2!="BACKEND" && $2!="FRONTEND"' | wc -l)

if [ $DOWN_COUNT -gt $((TOTAL / 2)) ]; then echo "CRITICAL: More than half of backend servers are DOWN" | \ mail -s "HAProxy Critical Alert" $ALERT_EMAIL fi

# Log details echo "$DOWN_SERVERS" | while IFS=, read backend server; do echo "$(date): Server $backend/$server is DOWN" >> $LOG_FILE

# Check if server is actually reachable SERVER_IP=$(echo "show servers state" | socat stdio $HAPROXY_SOCK | \ grep -w "$server" | awk '{print $5}')

if nc -z -w2 $SERVER_IP 80 2>/dev/null; then echo "$(date): $server is reachable on port 80 - possible health check issue" >> $LOG_FILE else echo "$(date): $server is NOT reachable on port 80" >> $LOG_FILE fi done

exit 1 fi

# Check for servers with high error rates ERROR_SERVERS=$(echo "show stat" | socat stdio $HAPROXY_SOCK 2>/dev/null | \ awk -F, 'NR>1 && $18=="UP" && $14>100 {print $1","$2",errors="$14}')

if [ -n "$ERROR_SERVERS" ]; then echo "$(date): Servers with high error rates: $ERROR_SERVERS" >> $LOG_FILE fi

# Check queue depth QUEUED=$(echo "show stat" | socat stdio $HAPROXY_SOCK 2>/dev/null | \ awk -F, 'NR>1 && $2=="BACKEND" && $3>10 {print $1",queued="$3}')

if [ -n "$QUEUED" ]; then echo "$(date): High queue depth detected: $QUEUED" >> $LOG_FILE fi

echo "$(date): All backends healthy" >> $LOG_FILE exit 0 EOF

chmod +x /usr/local/bin/check-haproxy-backends.sh

# Add to cron: (crontab -l; echo "*/1 * * * * /usr/local/bin/check-haproxy-backends.sh") | crontab -

# Create Prometheus exporter for HAProxy: cat > /etc/haproxy/haproxy_exporter.sh << 'EOF' #!/bin/bash # Simple HAProxy Prometheus exporter

SOCK="/var/run/haproxy.sock" METRICS_FILE="/var/lib/node_exporter/textfile_collector/haproxy.prom"

echo "# HELP haproxy_backend_server_status Backend server status (1=UP, 0=DOWN)" > $METRICS_FILE echo "# TYPE haproxy_backend_server_status gauge" >> $METRICS_FILE

echo "show stat" | socat stdio $SOCK 2>/dev/null | \ awk -F, 'NR>1 && $2!="BACKEND" && $2!="FRONTEND" { status = ($18 == "UP") ? 1 : 0 gsub(/[^a-zA-Z0-9_]/, "_", $1) gsub(/[^a-zA-Z0-9_]/, "_", $2) print "haproxy_backend_server_status{backend=\"" $1 "\",server=\"" $2 "\"} " status }' >> $METRICS_FILE

echo "# HELP haproxy_backend_current_sessions Current sessions on backend" >> $METRICS_FILE echo "# TYPE haproxy_backend_current_sessions gauge" >> $METRICS_FILE

echo "show stat" | socat stdio $SOCK 2>/dev/null | \ awk -F, 'NR>1 && $2=="BACKEND" { gsub(/[^a-zA-Z0-9_]/, "_", $1) print "haproxy_backend_current_sessions{backend=\"" $1 "\"} " $5 }' >> $METRICS_FILE EOF

chmod +x /etc/haproxy/haproxy_exporter.sh

# Run every 30 seconds: (crontab -l; echo "* * * * * /etc/haproxy/haproxy_exporter.sh") | crontab - ```

Checklist for Fixing HAProxy Backend Server Issues

Step	Action	Command	Status
1	Identify DOWN servers	`echo "show stat" \	socat stdio /var/run/haproxy.sock`	☐
2	Verify backend server running	`curl -I http://192.168.1.10/health`	☐
3	Examine health check config	`grep -A20 "backend" /etc/haproxy/haproxy.cfg`	☐
4	Check network/firewall	`nc -zv 192.168.1.10 80`	☐
5	Test health check manually	`curl -v http://192.168.1.10/health`	☐
6	Fix health check config	Edit haproxy.cfg and reload	☐
7	Force server state via socket	`echo "set server backend/server state ready" \	socat ...`	☐
8	Fix backend server config	Create health endpoint	☐
9	Implement proper health endpoint	Add /health endpoint to app	☐
10	Set up monitoring	Create monitoring script and cron	☐

Verify the Fix

After fixing HAProxy backend issues, verify servers are UP:

```bash # 1. All servers show UP status echo "show stat" | socat stdio /var/run/haproxy.sock | grep -E "web-backend|api-backend" # All servers should show UP in status column

# 2. No DOWN servers in output echo "show stat" | socat stdio /var/run/haproxy.sock | awk -F, '$18=="DOWN" {print}' # Should return empty

# 3. Health checks passing tail -20 /var/log/haproxy.log | grep "Health check" # Should show "passed" or servers going UP

# 4. Traffic is distributed echo "show stat" | socat stdio /var/run/haproxy.sock | awk -F, 'NR>1 {print $2, $8}' # Sessions should be distributed across servers

# 5. No connection errors curl -I http://haproxy-server/ # Should get 200 OK from backend

# 6. Backend server responds directly curl -I http://192.168.1.10/health # Should return 200 OK

# 7. HAProxy logs clean tail -50 /var/log/haproxy.log | grep -i error # Should be minimal or no errors

# 8. Monitoring script passes /usr/local/bin/check-haproxy-backends.sh # Exit code 0

# 9. Prometheus metrics exported cat /var/lib/node_exporter/textfile_collector/haproxy.prom # Should show all servers with status 1

# 10. Load balanced correctly for i in {1..10}; do curl -s http://haproxy-server/ | grep "server"; done # Responses should come from different backends ```

[Fix Nginx 504 Upstream Timeout](/articles/fix-nginx-504-upstream) - Nginx upstream timeouts
[Fix Nginx 502 Bad Gateway](/articles/fix-nginx-502-bad-gateway) - Bad gateway errors
[Fix AWS ELB Health Check Failing](/articles/fix-aws-elb-health-check-failing) - ELB health check issues
[Fix HAProxy SSL Certificate Error](/articles/fix-haproxy-ssl-certificate-error) - SSL configuration issues
[Fix Load Balancer Session Sticky](/articles/fix-load-balancer-session-sticky) - Session persistence issues
[Fix HAProxy Configuration Syntax Error](/articles/fix-haproxy-configuration-syntax-error) - Config validation
[Fix Backend Connection Pool Exhausted](/articles/fix-backend-connection-pool-exhausted) - Connection limits

What's Actually Happening

The Error You'll See

Why This Happens

Step 1: Identify Which Backend Servers Are DOWN

Step 2: Verify Backend Server Is Actually Running

Step 3: Examine Health Check Configuration

Step 4: Check Network Connectivity and Firewalls

Step 5: Manually Test HAProxy Health Check

Step 6: Fix Health Check Configuration

Step 7: Force Server State via Socket

Step 8: Fix Backend Server Configuration

Step 9: Implement Proper Health Check Endpoint

Step 10: Set Up Monitoring for Backend Health

Checklist for Fixing HAProxy Backend Server Issues

Verify the Fix

Related Issues

Share this guide

More Load Balancer Troubleshooting Guides

Azure Front Door Routing Rule Not Matching

Azure Front Door Backend Unavailable

Azure Application Gateway SSL Certificate Missing

Azure Application Gateway WAF Blocks Legitimate

Azure Application Gateway 502

Azure Load Balancer Outbound Rule Not Working