Introduction
Load balancer 502 Bad Gateway errors occur when the load balancer (acting as a reverse proxy) receives an invalid response from the upstream backend server, causing it to return a 502 error to the client. Unlike 503 Service Unavailable (no healthy backends), a 502 indicates the load balancer successfully connected to a backend but the response was malformed, incomplete, or protocol-violating. Common causes include backend server returning malformed HTTP response, SSL/TLS handshake failure between load balancer and backend, backend timeout exceeded while load balancer waited, backend connection closed prematurely, HTTP/2 or HTTP/3 protocol mismatch, backend sending oversized headers, response body size exceeding limits, backend application crash during request processing, and network interruption during response transmission. The fix requires understanding proxy-to-backend communication, proper timeout configuration, SSL/TLS settings, HTTP protocol requirements, and systematic debugging of the upstream connection. This guide provides production-proven troubleshooting for 502 errors across AWS ALB/NLB, NGINX, HAProxy, F5, Azure Load Balancer, and GCP Cloud Load Balancing.
Symptoms
- HTTP 502 Bad Gateway returned to clients
- Load balancer access logs show
upstream_response_invalidor similar - AWS ALB:
Target.ResponseCodeshows 502 or 0 - NGINX:
upstream prematurely closed connectionin error logs - HAProxy:
server backend/app1 DOWNwith 502 errors - Backend server logs show request received but response not sent
- Intermittent 502s during high traffic periods
- 502 occurs only for specific endpoints (large responses, long processing)
- SSL-related 502:
SSL_do_handshake() failed - Connection reset during response:
Connection reset by peer
Common Causes
- Backend server crashed or returned malformed response
- Load balancer timeout shorter than backend processing time
- SSL/TLS certificate mismatch or expired certificate on backend
- Backend sending HTTP response with invalid status line
- Response headers exceed load balancer limits (typically 8KB-16KB)
- Backend connection pool exhausted, connections timing out
- HTTP protocol version mismatch (HTTP/2 backend, HTTP/1.1 LB)
- Backend firewall terminating idle connections
- Network packet loss or MTU issues causing truncated responses
- Backend application throwing unhandled exception mid-response
Step-by-Step Fix
### 1. Diagnose 502 error source
Check load balancer logs:
```bash # AWS ALB - Check access logs in S3 # Look for elb_status_code 502 aws s3 ls s3://alb-access-logs-prefix/AWSLogs/<account-id>/elasticloadbalancing/
# Parse ALB logs for 502s aws s3 cp s3://bucket/prefix/logfile.gz . --quiet zcat logfile.gz | awk '$9 == 502' | head -20
# Key fields in ALB logs: # - elb_status_code: 502 = LB returned error # - target_status_code: 0 = no response from target # - target_processing_time: Time backend took
# NGINX error logs tail -100 /var/log/nginx/error.log | grep -E "502|upstream"
# Common NGINX 502 messages: # - upstream prematurely closed connection # - upstream sent too big header # - upstream sent invalid response # - connect() failed (111: Connection refused) ```
Check backend health:
```bash # AWS ALB - Check target health aws elbv2 describe-target-health \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/name/xyz
# Check backend server directly curl -v http://<backend-ip>:8080/health
# Check backend application status systemctl status my-app journalctl -u my-app -n 50 --no-pager ```
### 2. Fix timeout configuration
AWS ALB timeout settings:
```bash # ALB idle timeout (default 60s, max 4000s) aws elbv2 modify-load-balancer-attributes \ --load-balancer-arn arn:aws:elasticloadbalancing:region:account:loadbalancer/app/my-alb/xyz \ --attributes "Key=idle_timeout.timeout_seconds"="120"
# Note: ALB target group timeout is fixed at 300s
# For longer operations, consider: # - Async processing with status polling # - WebSocket for long-lived connections # - Lambda with longer timeout ```
NGINX proxy timeout configuration:
```nginx http { # Global proxy timeouts proxy_connect_timeout 60s; proxy_send_timeout 60s; proxy_read_timeout 120s; # Increase for slow backends
# Buffer settings (prevent 502 from large responses) proxy_buffer_size 4k; proxy_buffers 8 16k; proxy_busy_buffers_size 24k;
server { listen 443 ssl; server_name app.example.com;
location / { proxy_pass http://backend;
# Timeout settings proxy_connect_timeout 30s; proxy_send_timeout 60s; proxy_read_timeout 120s; # Adjust based on backend
# Buffer configuration proxy_buffering on; proxy_buffer_size 8k; proxy_buffers 16 32k; proxy_busy_buffers_size 64k; proxy_max_temp_file_size 1024m;
# Handle upstream errors proxy_next_upstream error timeout http_502 http_503 http_504; proxy_next_upstream_tries 3; proxy_next_upstream_timeout 10s; }
# For long-running endpoints location /api/long-running { proxy_read_timeout 300s; # 5 minutes proxy_send_timeout 300s; } } } ```
HAProxy timeout configuration:
```haproxy defaults timeout connect 10s timeout client 30s timeout server 120s # Increase for slow backends timeout http-request 30s timeout http-keep-alive 10s timeout queue 60s
backend app_servers balance roundrobin option httpchk GET /health HTTP/1.1\r\nHost:\ app.example.com
# Retry on 502 option allbackups retry-on conn-failure response-timeout 502 503 504 retries 3
server app1 10.0.1.1:8080 check inter 5s fall 3 rise 2 server app2 10.0.1.2:8080 check inter 5s fall 3 rise 2 ```
### 3. Fix SSL/TLS issues
Backend SSL certificate validation:
```bash # Test backend SSL certificate openssl s_client -connect backend.example.com:443 -servername backend.example.com </dev/null 2>/dev/null | openssl x509 -noout -dates
# Check certificate chain openssl s_client -connect backend.example.com:443 -showcerts </dev/null 2>/dev/null
# Common SSL issues causing 502: # - Certificate expired # - Self-signed certificate not trusted by LB # - Hostname mismatch (CN/SAN doesn't match) # - Missing intermediate certificate ```
NGINX SSL backend configuration:
```nginx # For HTTPS backends location / { proxy_pass https://backend;
# SSL verification (disable for self-signed in internal networks) proxy_ssl_verify off; # Only for internal, trusted backends proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt; proxy_ssl_verify_depth 2;
# SSL protocols proxy_ssl_protocols TLSv1.2 TLSv1.3; proxy_ssl_ciphers HIGH:!aNULL:!MD5;
# SNI support proxy_ssl_server_name on; } ```
AWS ALB HTTPS backend:
```bash # For HTTPS targets, ALB validates certificates # Options: # 1. Use valid public certificate on backend # 2. Use ACM private CA for internal services # 3. Disable verification (not recommended)
# Check target group protocol aws elbv2 describe-target-groups \ --target-group-arn arn:xxx \ --query 'TargetGroups[0].{Protocol:Protocol,Port:Port}'
# Update to HTTP if backend doesn't have valid cert aws elbv2 modify-target-group \ --target-group-arn arn:xxx \ --protocol HTTP \ --port 80 ```
### 4. Fix response size issues
Handle large responses:
```nginx # NGINX - Increase buffer sizes for large responses location /api/large-response { proxy_pass http://backend;
# Larger buffers proxy_buffer_size 32k; proxy_buffers 16 128k; proxy_busy_buffers_size 256k;
# Allow temp files for very large responses proxy_max_temp_file_size 2048m; proxy_temp_path /var/nginx/proxy_temp; } ```
Handle large headers:
```nginx # NGINX - Default header buffer is 4k-8k # Increase if backend sends large cookies or headers location / { proxy_pass http://backend;
# Large header buffer proxy_buffer_size 16k; proxy_buffers 8 32k;
# Ignore large headers from backend (instead of 502) proxy_ignore_headers X-Large-Header; } ```
AWS ALB limits:
```bash # ALB has fixed limits: # - Response header size: 16 KB max # - Response body: No limit (but timeout applies)
# If backend exceeds limits: # 1. Reduce cookie sizes # 2. Use session IDs instead of large cookies # 3. Compress responses # 4. Split large headers into multiple ```
### 5. Fix connection pool issues
Backend connection exhaustion:
```bash # Check backend connection limits # Nginx backend netstat -an | grep :8080 | wc -l ss -s
# Application connection pool # Check application metrics/logs for pool exhaustion
# Linux file descriptor limits ulimit -n cat /proc/sys/fs/file-nr ```
Increase backend capacity:
```nginx # Backend NGINX configuration worker_processes auto; worker_rlimit_nofile 65535;
events { worker_connections 4096; multi_accept on; }
http { # Keepalive to upstream upstream backend_pool { server 127.0.0.1:3000; keepalive 32; }
server { location / { proxy_pass http://backend_pool; proxy_http_version 1.1; proxy_set_header Connection ""; } } } ```
HAProxy connection settings:
```haproxy global maxconn 50000 nbthread 4
defaults timeout connect 10s timeout server 120s
backend app_servers balance roundrobin option http-keep-alive http-reuse safe
# Connection limits per server server app1 10.0.1.1:8080 check maxconn 1000 server app2 10.0.1.2:8080 check maxconn 1000 ```
### 6. Fix protocol issues
HTTP/2 configuration:
```nginx # NGINX with HTTP/2 to clients, HTTP/1.1 to backend server { listen 443 ssl http2;
location / { proxy_pass http://backend; proxy_http_version 1.1; # Backend speaks HTTP/1.1
# Required headers proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } } ```
WebSocket support (prevents 502 on upgrade):
nginx
location /ws/ {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 86400s; # 24 hours for long-lived connections
proxy_send_timeout 86400s;
}
### 7. Debug backend application
Application-level debugging:
```bash # Check application logs during 502 journalctl -u my-app -f
# Look for: # - Unhandled exceptions # - Out of memory errors # - Database connection timeouts # - Deadlocks
# Profile slow endpoints # Java: jstack <pid> > thread-dump.txt # Node.js: clinic doctor # Python: py-spy record
# Check resource usage during 502 top -bn1 | head -10 free -m df -h ```
Add application health checks:
```python # Flask health endpoint @app.route('/health') def health(): """Return 200 only if app can serve requests.""" try: # Check critical dependencies db_health = check_database() cache_health = check_redis()
if db_health and cache_health: return jsonify({"status": "healthy"}), 200 else: return jsonify({"status": "unhealthy"}), 503 except Exception as e: return jsonify({"status": "unhealthy", "error": str(e)}), 503 ```
Prevention
- Monitor backend response times and set alerts for p95 > threshold
- Set load balancer timeout 2x the p99 backend response time
- Use circuit breakers to fail fast when backend is unhealthy
- Implement proper health checks that verify dependencies
- Configure connection pools with appropriate limits
- Use compression for large responses
- Set up distributed tracing to identify slow endpoints
- Regular load testing to identify breaking points
- Implement request timeouts at application level
- Use async processing for long-running operations
Related Errors
- **Load balancer 503 Service Unavailable**: No healthy backends
- **Load balancer health check failure**: Backend failing health probes
- **Load balancer SSL handshake failed**: TLS configuration issues
- **Load balancer sticky session failure**: Session affinity issues