Introduction

Load balancer 502 Bad Gateway errors occur when the load balancer (acting as a reverse proxy) receives an invalid response from the upstream backend server, causing it to return a 502 error to the client. Unlike 503 Service Unavailable (no healthy backends), a 502 indicates the load balancer successfully connected to a backend but the response was malformed, incomplete, or protocol-violating. Common causes include backend server returning malformed HTTP response, SSL/TLS handshake failure between load balancer and backend, backend timeout exceeded while load balancer waited, backend connection closed prematurely, HTTP/2 or HTTP/3 protocol mismatch, backend sending oversized headers, response body size exceeding limits, backend application crash during request processing, and network interruption during response transmission. The fix requires understanding proxy-to-backend communication, proper timeout configuration, SSL/TLS settings, HTTP protocol requirements, and systematic debugging of the upstream connection. This guide provides production-proven troubleshooting for 502 errors across AWS ALB/NLB, NGINX, HAProxy, F5, Azure Load Balancer, and GCP Cloud Load Balancing.

Symptoms

  • HTTP 502 Bad Gateway returned to clients
  • Load balancer access logs show upstream_response_invalid or similar
  • AWS ALB: Target.ResponseCode shows 502 or 0
  • NGINX: upstream prematurely closed connection in error logs
  • HAProxy: server backend/app1 DOWN with 502 errors
  • Backend server logs show request received but response not sent
  • Intermittent 502s during high traffic periods
  • 502 occurs only for specific endpoints (large responses, long processing)
  • SSL-related 502: SSL_do_handshake() failed
  • Connection reset during response: Connection reset by peer

Common Causes

  • Backend server crashed or returned malformed response
  • Load balancer timeout shorter than backend processing time
  • SSL/TLS certificate mismatch or expired certificate on backend
  • Backend sending HTTP response with invalid status line
  • Response headers exceed load balancer limits (typically 8KB-16KB)
  • Backend connection pool exhausted, connections timing out
  • HTTP protocol version mismatch (HTTP/2 backend, HTTP/1.1 LB)
  • Backend firewall terminating idle connections
  • Network packet loss or MTU issues causing truncated responses
  • Backend application throwing unhandled exception mid-response

Step-by-Step Fix

### 1. Diagnose 502 error source

Check load balancer logs:

```bash # AWS ALB - Check access logs in S3 # Look for elb_status_code 502 aws s3 ls s3://alb-access-logs-prefix/AWSLogs/<account-id>/elasticloadbalancing/

# Parse ALB logs for 502s aws s3 cp s3://bucket/prefix/logfile.gz . --quiet zcat logfile.gz | awk '$9 == 502' | head -20

# Key fields in ALB logs: # - elb_status_code: 502 = LB returned error # - target_status_code: 0 = no response from target # - target_processing_time: Time backend took

# NGINX error logs tail -100 /var/log/nginx/error.log | grep -E "502|upstream"

# Common NGINX 502 messages: # - upstream prematurely closed connection # - upstream sent too big header # - upstream sent invalid response # - connect() failed (111: Connection refused) ```

Check backend health:

```bash # AWS ALB - Check target health aws elbv2 describe-target-health \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/name/xyz

# Check backend server directly curl -v http://<backend-ip>:8080/health

# Check backend application status systemctl status my-app journalctl -u my-app -n 50 --no-pager ```

### 2. Fix timeout configuration

AWS ALB timeout settings:

```bash # ALB idle timeout (default 60s, max 4000s) aws elbv2 modify-load-balancer-attributes \ --load-balancer-arn arn:aws:elasticloadbalancing:region:account:loadbalancer/app/my-alb/xyz \ --attributes "Key=idle_timeout.timeout_seconds"="120"

# Note: ALB target group timeout is fixed at 300s

# For longer operations, consider: # - Async processing with status polling # - WebSocket for long-lived connections # - Lambda with longer timeout ```

NGINX proxy timeout configuration:

```nginx http { # Global proxy timeouts proxy_connect_timeout 60s; proxy_send_timeout 60s; proxy_read_timeout 120s; # Increase for slow backends

# Buffer settings (prevent 502 from large responses) proxy_buffer_size 4k; proxy_buffers 8 16k; proxy_busy_buffers_size 24k;

server { listen 443 ssl; server_name app.example.com;

location / { proxy_pass http://backend;

# Timeout settings proxy_connect_timeout 30s; proxy_send_timeout 60s; proxy_read_timeout 120s; # Adjust based on backend

# Buffer configuration proxy_buffering on; proxy_buffer_size 8k; proxy_buffers 16 32k; proxy_busy_buffers_size 64k; proxy_max_temp_file_size 1024m;

# Handle upstream errors proxy_next_upstream error timeout http_502 http_503 http_504; proxy_next_upstream_tries 3; proxy_next_upstream_timeout 10s; }

# For long-running endpoints location /api/long-running { proxy_read_timeout 300s; # 5 minutes proxy_send_timeout 300s; } } } ```

HAProxy timeout configuration:

```haproxy defaults timeout connect 10s timeout client 30s timeout server 120s # Increase for slow backends timeout http-request 30s timeout http-keep-alive 10s timeout queue 60s

backend app_servers balance roundrobin option httpchk GET /health HTTP/1.1\r\nHost:\ app.example.com

# Retry on 502 option allbackups retry-on conn-failure response-timeout 502 503 504 retries 3

server app1 10.0.1.1:8080 check inter 5s fall 3 rise 2 server app2 10.0.1.2:8080 check inter 5s fall 3 rise 2 ```

### 3. Fix SSL/TLS issues

Backend SSL certificate validation:

```bash # Test backend SSL certificate openssl s_client -connect backend.example.com:443 -servername backend.example.com </dev/null 2>/dev/null | openssl x509 -noout -dates

# Check certificate chain openssl s_client -connect backend.example.com:443 -showcerts </dev/null 2>/dev/null

# Common SSL issues causing 502: # - Certificate expired # - Self-signed certificate not trusted by LB # - Hostname mismatch (CN/SAN doesn't match) # - Missing intermediate certificate ```

NGINX SSL backend configuration:

```nginx # For HTTPS backends location / { proxy_pass https://backend;

# SSL verification (disable for self-signed in internal networks) proxy_ssl_verify off; # Only for internal, trusted backends proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt; proxy_ssl_verify_depth 2;

# SSL protocols proxy_ssl_protocols TLSv1.2 TLSv1.3; proxy_ssl_ciphers HIGH:!aNULL:!MD5;

# SNI support proxy_ssl_server_name on; } ```

AWS ALB HTTPS backend:

```bash # For HTTPS targets, ALB validates certificates # Options: # 1. Use valid public certificate on backend # 2. Use ACM private CA for internal services # 3. Disable verification (not recommended)

# Check target group protocol aws elbv2 describe-target-groups \ --target-group-arn arn:xxx \ --query 'TargetGroups[0].{Protocol:Protocol,Port:Port}'

# Update to HTTP if backend doesn't have valid cert aws elbv2 modify-target-group \ --target-group-arn arn:xxx \ --protocol HTTP \ --port 80 ```

### 4. Fix response size issues

Handle large responses:

```nginx # NGINX - Increase buffer sizes for large responses location /api/large-response { proxy_pass http://backend;

# Larger buffers proxy_buffer_size 32k; proxy_buffers 16 128k; proxy_busy_buffers_size 256k;

# Allow temp files for very large responses proxy_max_temp_file_size 2048m; proxy_temp_path /var/nginx/proxy_temp; } ```

Handle large headers:

```nginx # NGINX - Default header buffer is 4k-8k # Increase if backend sends large cookies or headers location / { proxy_pass http://backend;

# Large header buffer proxy_buffer_size 16k; proxy_buffers 8 32k;

# Ignore large headers from backend (instead of 502) proxy_ignore_headers X-Large-Header; } ```

AWS ALB limits:

```bash # ALB has fixed limits: # - Response header size: 16 KB max # - Response body: No limit (but timeout applies)

# If backend exceeds limits: # 1. Reduce cookie sizes # 2. Use session IDs instead of large cookies # 3. Compress responses # 4. Split large headers into multiple ```

### 5. Fix connection pool issues

Backend connection exhaustion:

```bash # Check backend connection limits # Nginx backend netstat -an | grep :8080 | wc -l ss -s

# Application connection pool # Check application metrics/logs for pool exhaustion

# Linux file descriptor limits ulimit -n cat /proc/sys/fs/file-nr ```

Increase backend capacity:

```nginx # Backend NGINX configuration worker_processes auto; worker_rlimit_nofile 65535;

events { worker_connections 4096; multi_accept on; }

http { # Keepalive to upstream upstream backend_pool { server 127.0.0.1:3000; keepalive 32; }

server { location / { proxy_pass http://backend_pool; proxy_http_version 1.1; proxy_set_header Connection ""; } } } ```

HAProxy connection settings:

```haproxy global maxconn 50000 nbthread 4

defaults timeout connect 10s timeout server 120s

backend app_servers balance roundrobin option http-keep-alive http-reuse safe

# Connection limits per server server app1 10.0.1.1:8080 check maxconn 1000 server app2 10.0.1.2:8080 check maxconn 1000 ```

### 6. Fix protocol issues

HTTP/2 configuration:

```nginx # NGINX with HTTP/2 to clients, HTTP/1.1 to backend server { listen 443 ssl http2;

location / { proxy_pass http://backend; proxy_http_version 1.1; # Backend speaks HTTP/1.1

# Required headers proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } } ```

WebSocket support (prevents 502 on upgrade):

nginx location /ws/ { proxy_pass http://backend; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_read_timeout 86400s; # 24 hours for long-lived connections proxy_send_timeout 86400s; }

### 7. Debug backend application

Application-level debugging:

```bash # Check application logs during 502 journalctl -u my-app -f

# Look for: # - Unhandled exceptions # - Out of memory errors # - Database connection timeouts # - Deadlocks

# Profile slow endpoints # Java: jstack <pid> > thread-dump.txt # Node.js: clinic doctor # Python: py-spy record

# Check resource usage during 502 top -bn1 | head -10 free -m df -h ```

Add application health checks:

```python # Flask health endpoint @app.route('/health') def health(): """Return 200 only if app can serve requests.""" try: # Check critical dependencies db_health = check_database() cache_health = check_redis()

if db_health and cache_health: return jsonify({"status": "healthy"}), 200 else: return jsonify({"status": "unhealthy"}), 503 except Exception as e: return jsonify({"status": "unhealthy", "error": str(e)}), 503 ```

Prevention

  • Monitor backend response times and set alerts for p95 > threshold
  • Set load balancer timeout 2x the p99 backend response time
  • Use circuit breakers to fail fast when backend is unhealthy
  • Implement proper health checks that verify dependencies
  • Configure connection pools with appropriate limits
  • Use compression for large responses
  • Set up distributed tracing to identify slow endpoints
  • Regular load testing to identify breaking points
  • Implement request timeouts at application level
  • Use async processing for long-running operations
  • **Load balancer 503 Service Unavailable**: No healthy backends
  • **Load balancer health check failure**: Backend failing health probes
  • **Load balancer SSL handshake failed**: TLS configuration issues
  • **Load balancer sticky session failure**: Session affinity issues