# HAProxy All Backends Down: Complete Recovery Guide

When HAProxy reports all backends as down, your entire application becomes unavailable. This is one of the most critical issues you can face, as it typically indicates a systemic problem affecting all backend servers.

Symptoms and Error Messages

You might see these indicators:

bash
[WARNING] 123/145023 (1234) : Server web_servers/web1 is DOWN, reason: Layer4 connection problem.
[WARNING] 123/145023 (1234) : Server web_servers/web2 is DOWN, reason: Layer4 connection problem.
[ALERT] 123/145023 (1234) : backend 'web_servers' has no server available!

Clients receive 503 errors:

``` HTTP/1.0 503 Service Unavailable Cache-Control: no-cache Connection: close Content-Type: text/html

<html><body><h1>503 Service Unavailable</h1> No server is available to handle this request. </body></html> ```

Immediate Diagnosis

Check HAProxy Stats Page

Access your stats page to see the current state of all servers:

```bash # If stats are available via CLI echo "show stat" | socat stdio /var/run/haproxy.sock

# Or via HTTP if configured curl http://localhost:8404/stats ```

Look for these columns: - status - should be "UP" (down servers show "DOWN") - lastchg - time since last state change - lastsess - time since last session

Check Backend Server Health Directly

Bypass HAProxy and test your backends:

```bash # Test direct connection to backend server curl -I http://192.168.1.10:80/ curl -I http://192.168.1.11:80/

# Test TCP connectivity nc -zv 192.168.1.10 80 nc -zv 192.168.1.11 80 ```

If these fail, the problem is with the backend servers, not HAProxy.

Root Cause Analysis

Cause 1: Health Check Failures

The most common cause is that health checks are failing. HAProxy marks servers as DOWN when health checks consistently fail.

Check health check configuration:

haproxy
backend web_servers
    option httpchk GET /health
    http-check expect status 200
    server web1 192.168.1.10:80 check
    server web2 192.168.1.11:80 check

Verify the health endpoint:

bash
# Test the health check URL directly
curl -v http://192.168.1.10:80/health
curl -v http://192.168.1.11:80/health

Common health check issues:

  1. 1.Wrong endpoint path:
haproxy
# Change from
option httpchk GET /healthz
# To
option httpchk GET /health
  1. 1.Expected status mismatch:

```haproxy # If your app returns 204 instead of 200 http-check expect status 204

# Or accept a range http-check expect status 200-399 ```

  1. 1.Health check timing too aggressive:
haproxy
# Increase inter to give servers more time
server web1 192.168.1.10:80 check inter 5s fall 3 rise 2
  • inter 5s - check every 5 seconds
  • fall 3 - mark DOWN after 3 failures
  • rise 2 - mark UP after 2 successes

Cause 2: Network Connectivity Issues

HAProxy cannot reach backend servers.

Test from HAProxy's perspective:

```bash # Check if HAProxy can resolve backend hostnames getent hosts backend1.internal.example.com

# Test connectivity ping -c 3 192.168.1.10 traceroute 192.168.1.10

# Check if firewall is blocking iptables -L -n | grep 80 ```

Verify HAProxy's source IP:

If HAProxy binds to a specific source IP for health checks:

haproxy
backend web_servers
    source 10.0.0.5  # HAProxy uses this IP for outgoing connections
    server web1 192.168.1.10:80 check

Ensure that IP is configured on the HAProxy server:

bash
ip addr show | grep 10.0.0.5

Cause 3: Backend Server Overload

Servers might be rejecting connections due to resource exhaustion.

Check backend server metrics:

```bash # SSH into backend server ssh user@192.168.1.10

# Check load average uptime

# Check memory free -m

# Check connection count ss -s

# Check if service is running systemctl status nginx systemctl status apache2 ```

Common fixes for overloaded servers:

  1. 1.Increase maximum connections:
bash
# On backend server (nginx)
# /etc/nginx/nginx.conf
worker_processes auto;
events {
    worker_connections 4096;
}
  1. 1.Increase system limits:
bash
# /etc/security/limits.conf
* soft nofile 65535
* hard nofile 65535

Cause 4: SSL/TLS Mismatch

If using HTTPS health checks, certificate issues can cause failures.

Wrong configuration:

haproxy
backend web_servers
    option httpchk GET /health
    server web1 192.168.1.10:443 check ssl verify none

Problem: option httpchk sends plain HTTP but server expects HTTPS.

Correct configuration:

haproxy
backend web_servers
    option httpchk GET /health
    http-check expect status 200
    server web1 192.168.1.10:443 check ssl verify none alpn h2,http/1.1

Or use SSL for health checks:

haproxy
backend web_servers
    server web1 192.168.1.10:443 check-ssl ssl verify none

Cause 5: Firewall or Security Group Blocking

Cloud environments often have security groups blocking traffic.

Check firewall rules:

```bash # On HAProxy server iptables -L -n -v | grep DROP

# Check if backend port is accessible nc -zv 192.168.1.10 80 ```

Temporary diagnostic:

bash
# Capture traffic to see what's happening
tcpdump -i eth0 port 80 -nn

Look for: - SYN packets without SYN-ACK responses - RST packets - Connection timeouts

Emergency Recovery Steps

Step 1: Disable Health Checks Temporarily

If you need to restore service immediately:

haproxy
backend web_servers
    server web1 192.168.1.10:80 check disabled
    server web2 192.168.1.11:80 check disabled

Then reload:

bash
systemctl reload haproxy

This forces servers to be considered UP regardless of health checks.

Step 2: Use Maintenance Mode

Mark servers as in maintenance:

bash
# Via socket
echo "set server web_servers/web1 state maint" | socat stdio /var/run/haproxy.sock
echo "set server web_servers/web1 state ready" | socat stdio /var/run/haproxy.sock

Step 3: Check Backend Logs

On backend servers:

```bash # Check nginx access logs tail -f /var/log/nginx/access.log

# Check nginx error logs tail -f /var/log/nginx/error.log

# Check system logs journalctl -u nginx -f ```

Prevention and Monitoring

Implement Better Health Checks

haproxy
backend web_servers
    option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
    http-check expect status 200
    server web1 192.168.1.10:80 check inter 3s fall 5 rise 2

Add Monitoring Alerts

Create a script to alert when backends go down:

```bash #!/bin/bash # /usr/local/bin/check-haproxy-backends.sh

DOWN_SERVERS=$(echo "show stat" | socat stdio /var/run/haproxy.sock | grep DOWN | wc -l)

if [ "$DOWN_SERVERS" -gt 0 ]; then echo "WARNING: $DOWN_SERVERS backend servers are DOWN" | mail -s "HAProxy Alert" admin@example.com fi ```

Add to cron:

bash
*/5 * * * * /usr/local/bin/check-haproxy-backends.sh

Configure Graceful Degradation

When backends fail, serve a maintenance page:

```haproxy backend web_servers server web1 192.168.1.10:80 check server web2 192.168.1.11:80 check

# Serve error page when all servers are down errorfile 503 /etc/haproxy/errors/503.http ```

Verification

After fixing the issue:

```bash # Check server status echo "show stat" | socat stdio /var/run/haproxy.sock | grep -E "web_servers|svname"

# Test through HAProxy curl -I http://localhost/

# Check logs tail -f /var/log/haproxy.log ```

All servers should show "UP" status and lastsess should update as traffic flows through.