Introduction
HAProxy backend connection errors and 503 failures occur when HAProxy cannot establish connections to backend servers due to health check failures, connection exhaustion, timeout misconfigurations, or server weight issues. HAProxy is a widely used open-source load balancer providing TCP and HTTP load balancing with health checks, SSL termination, and advanced traffic management. When all backends are unavailable or connection limits are reached, HAProxy returns 503 Service Unavailable to clients. Common causes include health check interval too aggressive marking healthy servers as down, backend connection pool exhausted (maxconn reached), timeout values too short for backend response time, server weight misconfiguration causing uneven load distribution, stick table overflow affecting session persistence, SSL handshake failures to backend, HTTP/2 protocol mismatch, backend application errors triggering health check failures, and kernel parameter limits (somaxconn, file descriptors). The fix requires understanding HAProxy architecture, health check configuration, connection management, timeout tuning, and debugging tools. This guide provides production-proven troubleshooting for HAProxy issues across web applications, microservices, and TCP load balancing scenarios.
Symptoms
- HAProxy returns
503 Service Unavailableto all requests No server available to handle requestin HAProxy logs- Backend servers show
DOWNstatus in HAProxy stats Connection refusedorConnection timed outto backends- Intermittent 503s affecting subset of requests
Server reached maxconnerrors- Health check failures despite backend responding
- Session persistence broken (stick table issues)
- SSL handshake failures to backend servers
- HAProxy stats page shows all backends red/down
Common Causes
- All backend servers marked unhealthy by health checks
- HAProxy maxconn limit reached (global or per-frontend)
- Backend server maxconn too low for traffic volume
- Timeout values (connect, client, server) misconfigured
- Health check path returning non-200 status
- Health check interval too frequent or timeout too short
- Backend connection pool exhausted
- Kernel limits: net.core.somaxconn, file descriptors
- Stick table overflow causing session loss
- SSL/TLS configuration mismatch with backend
- HTTP/2 protocol not supported by backend
Step-by-Step Fix
### 1. Diagnose HAProxy status
Check backend health via stats:
```bash # Enable HAProxy stats socket # /etc/haproxy/haproxy.cfg global stats socket /var/run/haproxy.sock mode 660 level admin
# Check backend status via socket echo "show stat" | socat stdio /var/run/haproxy.sock | cut -d',' -f1,2,17,18
# Output format: # pxname,svname,svstatus,check_status # frontend,fe_http,UP,OK # backend,be_app,s1,DOWN,SERVICE UNAVAILABLE # backend,be_app,s2,UP,OK
# Detailed server info echo "show servers state" | socat stdio /var/run/haproxy.sock
# Show backend table echo "show table" | socat stdio /var/run/haproxy.sock
# Access stats page (if configured) # http://haproxy:8404/stats ```
HAProxy stats page configuration:
haproxy
# Enable stats page
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
stats admin if LOCALHOST
stats auth admin:password # Optional authentication
Check HAProxy logs:
```bash # HAProxy log location (rsyslog) # /etc/rsyslog.d/49-haproxy.conf local0.* /var/log/haproxy.log
# Common error patterns:
# No backend available # "backend be_app has no server available"
# Connection refused # "Layer4 connection problem" "Connection refused"
# Timeout # "Layer6 invalid response" "Timeout"
# Maxconn reached # "Server be_app/s1 reached maxconn"
# View logs in real-time tail -f /var/log/haproxy.log | grep -E "DOWN|error|503" ```
### 2. Fix health check configuration
Health check tuning:
```haproxy # Backend with health checks backend be_app balance roundrobin
# Health check options option httpchk GET /health HTTP/1.1\r\nHost:\ localhost
# HTTP 200-399 = healthy http-check expect status 200-399
# Health check parameters # inter: Interval between checks (default 2000ms) # fall: Consecutive failures before marking DOWN # rise: Consecutive successes before marking UP # addr: Health check target IP (different from traffic) # port: Health check port
server s1 10.0.1.1:8080 check inter 5000ms fall 3 rise 2 server s2 10.0.1.2:8080 check inter 5000ms fall 3 rise 2 server s3 10.0.1.3:8080 check inter 5000ms fall 3 rise 2 backup
# Health check parameter guide: # - inter 5000ms: Check every 5 seconds (reduce load) # - fall 3: Mark DOWN after 3 failures (15 seconds to detect) # - rise 2: Mark UP after 2 successes (10 seconds to recover)
# For slow-starting applications: server s1 10.0.1.1:8080 check inter 10s fall 5 rise 3 slowstart 60s
# slowstart 60s: Gradually increase weight over 60 seconds after UP ```
Advanced health checks:
```haproxy # Health check with specific content match backend be_app option httpchk GET /health HTTP/1.1\r\nHost:\ localhost http-check expect status 200 http-check expect string "status.*ok"
server s1 10.0.1.1:8080 check
# Health check for different port backend be_db option mysql-check user haproxy
server db1 10.0.2.1:3306 check port 3309 # Check on different port server db2 10.0.2.2:3306 check
# TCP health check (layer 4) backend be_tcp option tcp-check
# Send data, expect response tcp-check send PING\r\n tcp-check expect string PONG
server t1 10.0.3.1:6379 check server t2 10.0.3.2:6379 check
# SSL health check backend be_https option ssl-hello-chk
server h1 10.0.4.1:443 check server h2 10.0.4.2:443 check ```
### 3. Fix connection exhaustion
Connection limits:
```haproxy # Global settings global maxconn 4096 # Total connections HAProxy can handle tune.maxrxconn 4096 # Per-frontend limit
# Frontend settings frontend fe_http bind *:80 maxconn 2000 # Max connections to this frontend default_backend be_app
# Backend settings backend be_app maxconn 1000 # Max connections to backend pool
# Per-server limits server s1 10.0.1.1:8080 check maxconn 500 minconn 100 server s2 10.0.1.2:8080 check maxconn 500 minconn 100
# maxconn: Maximum concurrent connections # minconn: Minimum connections to keep open (connection pooling) ```
Connection pooling:
```haproxy backend be_app # HTTP keep-alive (connection reuse) option http-keep-alive
# Server-side keep-alive http-request set-header Connection keep-alive
# Keep connections open to backend server s1 10.0.1.1:8080 check maxconn 500
# For HTTP/2 backends backend be_http2 # HTTP/2 to backend http-check expect ver HTTP/2
server s1 10.0.1.1:443 check ssl crt /etc/ssl/certs/ca.pem ```
Monitor connection usage:
```bash # Check current connections echo "show stat" | socat stdio /var/run/haproxy.sock | \ awk -F',' '{print $1, $2, $8, $9}' | column -t # Columns: pxname, svname, curconn, maxconn
# Check in real-time watch -n1 'echo "show stat" | socat stdio /var/run/haproxy.sock | grep -E "s[123]" | cut -d"," -f1,2,8,9' ```
### 4. Fix timeout configuration
Timeout tuning:
```haproxy # Timeout configuration global # Default timeouts timeout connect 5s # Time to establish connection to backend timeout client 30s # Time to wait for client request timeout server 30s # Time to wait for backend response timeout http-request 10s # Time for complete HTTP request timeout http-keep-alive 10s # Keep-alive timeout timeout queue 30s # Time in queue waiting for connection
# Frontend timeouts frontend fe_http timeout client 60s # Longer for slow clients
# Backend timeouts backend be_app timeout connect 10s # Longer for distant backends timeout server 60s # Longer for slow backends timeout tunnel 1h # For WebSocket/tunnel connections ```
Timeout troubleshooting:
```bash # Common timeout errors in logs:
# "cR" - Client read timeout (client stopped sending) # "cW" - Client write timeout (client too slow) # "sR" - Server read timeout (backend stopped sending) # "sW" - Server write timeout (backend too slow) # "C" - Connection timeout (backend didn't respond)
# Identify timeout issues tail -f /var/log/haproxy.log | grep -E "timeout|timed out"
# Adjust based on error patterns # If "sR" frequent: Increase timeout server # If "cR" frequent: Increase timeout client # If "C" frequent: Increase timeout connect ```
### 5. Fix server weight and load balancing
Weight configuration:
```haproxy backend be_app balance roundrobin
# Equal weight (default) server s1 10.0.1.1:8080 check weight 100 server s2 10.0.1.2:8080 check weight 100
# Unequal weight (more powerful server gets more traffic) server s3 10.0.1.3:8080 check weight 200 # Gets 2x traffic
# Dynamic weight (adjusted via runtime API) server s4 10.0.1.4:8080 check weight 100
# Adjust weight at runtime echo "set weight be_app/s3 50" | socat stdio /var/run/haproxy.sock echo "set weight be_app/s3 100%" | socat stdio /var/run/haproxy.sock ```
Load balancing algorithms:
```haproxy # Round Robin (default) - sequential distribution backend be_rr balance roundrobin server s1 10.0.1.1:8080 check server s2 10.0.1.2:8080 check
# Least Connections - send to server with fewest connections backend be_lc balance leastconn server s1 10.0.1.1:8080 check server s2 10.0.1.2:8080 check
# Source IP Hash - same client always goes to same server backend be_src balance source server s1 10.0.1.1:8080 check server s2 10.0.1.2:8080 check
# URI Hash - same URI always goes to same server backend be_uri balance uri server s1 10.0.1.1:8080 check server s2 10.0.1.2:8080 check
# First - use first available server, others as backup backend be_first balance first server s1 10.0.1.1:8080 check server s2 10.0.1.2:8080 check backup ```
### 6. Fix stick table session persistence
Stick table configuration:
```haproxy # Session persistence using source IP backend be_app balance roundrobin
# Stick table definition stick-table type ip size 100k expire 30m store conn_rate(10s)
# Track client IP stick on src
# Use same server for same IP stick store-request src
server s1 10.0.1.1:8080 check server s2 10.0.1.2:8080 check
# Session persistence using cookie backend be_app balance roundrobin
# Insert cookie with server name cookie SERVERID insert indirect nocache
server s1 10.0.1.1:8080 check cookie s1 server s2 10.0.1.2:8080 check cookie s2
# Cookie types: # - insert: HAProxy adds cookie # - prefix: HAProxy adds prefix to existing cookie # - rewrite: HAProxy rewrites existing cookie ```
Stick table monitoring:
```bash # View stick table contents echo "show table be_app" | socat stdio /var/run/haproxy.sock
# Clear stick table echo "clear table be_app" | socat stdio /var/run/haproxy.sock
# Delete specific entry echo "set table be_app key 10.0.1.100 data.g0 0" | socat stdio /var/run/haproxy.sock ```
### 7. Fix kernel parameter limits
System tuning:
```bash # Check current limits ulimit -n # File descriptors cat /proc/sys/net/core/somaxconn # Listen queue size
# Increase limits in /etc/security/limits.conf haproxy soft nofile 65536 haproxy hard nofile 65536
# Increase system limits in /etc/sysctl.conf net.core.somaxconn = 65536 net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_tw_reuse = 1
# Apply sysctl changes sysctl -p
# Verify HAProxy process limits cat /proc/$(pgrep haproxy)/limits | grep "open files" ```
HAProxy global settings:
```haproxy global # File descriptor limits maxconn 65536
# Process priority nice -5 ulimit-n 65536
# Tuning parameters tune.ssl.default-dh-param 2048 tune.bufsize 16384 tune.maxrewrite 1024 ```
### 8. Debug SSL/TLS issues to backend
SSL backend configuration:
```haproxy # SSL to backend backend be_https # SSL verification server s1 10.0.1.1:443 check ssl verify required ca-file /etc/ssl/certs/ca-bundle.crt server s2 10.0.1.2:443 check ssl verify required ca-file /etc/ssl/certs/ca-bundle.crt
# SSL without verification (not recommended for production) backend be_https server s1 10.0.1.1:443 check ssl verify none
# SSL with client certificate backend be_https server s1 10.0.1.1:443 check ssl crt /etc/ssl/certs/client.pem ca-file /etc/ssl/certs/ca-bundle.crt
# SNI to backend backend be_https server s1 10.0.1.1:443 check ssl sni req.hdr(host) ```
SSL error debugging:
```bash # Test SSL connection to backend openssl s_client -connect 10.0.1.1:443 -servername example.com </dev/null 2>&1 | \ grep -E "Verify|Certificate|error"
# Common errors: # - verify error:num=20:unable to get local issuer certificate # - verify error:num=21:unable to verify the first certificate # - ssl3_get_record:wrong version number
# Fix: Update CA bundle update-ca-certificates # Debian/Ubuntu ```
### 9. Monitor HAProxy with Prometheus
HAProxy exporter:
```yaml # Prometheus scrape config scrape_configs: - job_name: 'haproxy' static_configs: - targets: ['haproxy:8404'] metrics_path: '/metrics'
# Key metrics: # haproxy_frontend_status # haproxy_backend_status # haproxy_backend_current_sessions # haproxy_server_status # haproxy_server_current_connections # haproxy_server_response_5xx_total ```
Grafana alerting:
```yaml groups: - name: haproxy rules: - alert: HAProxyBackendDown expr: haproxy_backend_status == 0 for: 5m labels: severity: critical annotations: summary: "HAProxy backend is down"
- alert: HAProxyNoServers
- expr: haproxy_backend_active_servers == 0
- for: 1m
- labels:
- severity: critical
- annotations:
- summary: "HAProxy backend has no available servers"
- alert: HAProxyHigh5xx
- expr: rate(haproxy_server_response_5xx_total[5m]) > 10
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "HAProxy backend returning 5xx errors"
`
Prevention
- Configure health checks with appropriate intervals and thresholds
- Set connection limits based on backend capacity
- Tune timeouts based on application response patterns
- Monitor backend health and HAProxy metrics
- Use connection pooling for HTTP backends
- Implement proper SSL verification for HTTPS backends
- Document HAProxy configuration and tuning parameters
- Test failover scenarios regularly
Related Errors
- **503 Service Unavailable**: No healthy backends available
- **502 Bad Gateway**: Backend connection failed
- **504 Gateway Timeout**: Backend response timeout
- **Connection Refused**: Backend not accepting connections
- **SSL Handshake Failed**: TLS negotiation error with backend