Introduction

Nginx 502 Bad Gateway errors occur when Nginx, acting as a reverse proxy or load balancer, receives an invalid response from upstream servers or cannot establish a connection to them. Unlike 504 Gateway Timeout (which indicates the upstream is slow), 502 indicates the upstream actively returned a malformed response, closed the connection prematurely, or refused the connection entirely. Common causes include upstream server (Gunicorn, uWSGI, PHP-FPM, Node.js, etc.) crashed or not running, upstream process out of workers/connection slots, connection refused due to firewall or socket permissions, upstream returning invalid HTTP responses, keepalive connection closed by upstream, proxy buffer size exceeded, FastCGI parameter misconfiguration, and upstream SSL/TLS handshake failures. The fix requires understanding Nginx proxy architecture, upstream health checking, connection management, and proper error handling. This guide provides production-proven troubleshooting for Nginx 502 errors across PHP, Python, Node.js, and other backend configurations.

Symptoms

  • Browser displays 502 Bad Gateway error page
  • Nginx error log shows upstream prematurely closed connection
  • connect() failed (111: Connection refused) while connecting to upstream
  • upstream sent too big header errors in logs
  • no live upstreams while connecting to upstream
  • Application works intermittently, fails under load
  • 502 errors correlate with high traffic spikes
  • recv() failed (104: Connection reset by peer)
  • PHP-FPM returns 502 for specific requests
  • WebSocket connections fail with 502
  • gRPC calls return 502 Bad Gateway
  • Health checks show upstream as unhealthy

Common Causes

  • Upstream service (Gunicorn, uWSGI, PHP-FPM) not running
  • Upstream worker exhaustion (all workers busy)
  • Upstream connection limit reached (max_connections)
  • Unix socket permissions wrong (nginx can't connect)
  • TCP port not listening or firewall blocking
  • Upstream crashed due to OOM or panic
  • Keepalive connections closed by upstream prematurely
  • Proxy buffer too small for upstream response headers
  • Upstream returned invalid HTTP response (malformed)
  • SSL handshake failure between Nginx and upstream
  • FastCGI parameter SCRIPT_FILENAME incorrect
  • Upstream timeout shorter than Nginx timeout
  • DNS resolution failure for upstream hostname
  • Upstream behind load balancer returning errors

Step-by-Step Fix

### 1. Diagnose 502 error source

Check Nginx error logs:

```bash # Tail Nginx error log tail -f /var/log/nginx/error.log

# Common 502 error patterns:

# Connection refused - upstream not listening # connect() failed (111: Connection refused) while connecting to upstream

# Connection reset - upstream closed unexpectedly # recv() failed (104: Connection reset by peer) while reading response header

# Premature close - upstream closed before response # upstream prematurely closed connection while reading response header

# No live upstreams - all backends failed # no live upstreams while connecting to upstream

# Invalid header from upstream # upstream sent too big header while reading response header

# Check access log for 502 patterns grep " 502 " /var/log/nginx/access.log | tail -20

# Analyze 502 rate over time awk '$9 == 502 {print $4}' /var/log/nginx/access.log | cut -d: -f1,2 | uniq -c ```

Check upstream service status:

```bash # Check if upstream is running (systemd services) systemctl status gunicorn systemctl status php-fpm systemctl status uwsgi systemctl status node-app

# Check listening ports ss -tlnp | grep -E "8000|8080|9000" netstat -tlnp | grep -E "8000|8080|9000"

# Check Unix sockets ls -la /var/run/gunicorn.sock ls -la /run/php-fpm/www.sock

# Test upstream directly (bypass Nginx) curl -v http://127.0.0.1:8000/health curl -v --unix-socket /var/run/gunicorn.sock http://localhost/health ```

### 2. Fix upstream connection refused

TCP connection issues:

```bash # Check if port is listening ss -tlnp | grep :8000

# If not listening, start upstream service systemctl start gunicorn

# Check firewall (upstream may be listening but blocked) ufw status iptables -L -n | grep 8000

# Allow local connections ufw allow from 127.0.0.1 to any port 8000

# Check if upstream bound to correct interface ss -tlnp | grep 8000 # 127.0.0.1:8000 - localhost only # 0.0.0.0:8000 - all interfaces # [::]:8000 - IPv6 all interfaces

# Fix upstream binding (example: Gunicorn) gunicorn --bind 127.0.0.1:8000 app:app ```

Unix socket permission issues:

```bash # Check socket exists and permissions ls -la /var/run/gunicorn.sock # srwxrwxrwx 1 www-data www-data - should be accessible by nginx user

# Socket owned by wrong user # Fix: Set correct ownership in upstream config

# Gunicorn socket configuration cat > /etc/systemd/system/gunicorn.service << 'EOF' [Service] User=www-data Group=www-data WorkingDirectory=/var/www/myapp ExecStart=/usr/bin/gunicorn --bind unix:/var/run/gunicorn.sock \ --chown-socket www-data:www-data \ --chmod-socket 666 \ app:app EOF

# Nginx user must be able to access socket # Add nginx to www-data group (alternative) usermod -aG www-data nginx

# PHP-FPM socket configuration cat > /etc/php/8.1/fpm/pool.d/www.conf << 'EOF' [www] user = www-data group = www-data listen = /run/php/php8.1-fpm.sock listen.owner = www-data listen.group = www-data listen.mode = 0666 EOF

# Restart services systemctl daemon-reload systemctl restart gunicorn systemctl restart php8.1-fpm ```

### 3. Fix upstream worker exhaustion

Monitor upstream capacity:

```bash # Check Gunicorn workers ps aux | grep gunicorn | grep -v grep

# Check worker utilization # In Gunicorn logs, look for: # "worker temporary failure" or "worker failed to boot"

# Check PHP-FPM process manager status # Enable status page in pool config: pm.status_path = /fpm-status

# Access via curl (through Nginx or direct) curl http://127.0.0.1:9000/fpm-status

# Output: # pool: www # process manager: dynamic # start time: 01/Apr/2026:10:00:00 +0000 # start since: 86400 # accepted conn: 125000 # listen queue: 0 # max listen queue: 50 # listen queue len: 128 # idle processes: 5 # active processes: 20 # total processes: 25 # max active processes: 35 # max children reached: 10 # If > 0, limit was hit!

# If max children reached > 0, increase limit ```

Tune upstream workers:

```bash # Gunicorn worker tuning # Formula: workers = (2 x CPU cores) + 1 # Or: workers = available_memory / worker_memory

# For 4-core server with 8GB RAM, ~200MB per worker: # workers = 8GB / 200MB = ~40 (but capped by CPU) # Use 9-15 workers typically

gunicorn --workers 9 \ --worker-class sync \ --threads 4 \ --worker-connections 1000 \ --timeout 30 \ app:app

# Gunicorn config file (gunicorn.conf.py) workers = 9 worker_class = 'sync' # Or 'gevent', 'eventlet' for async threads = 4 worker_connections = 1000 timeout = 30 keepalive = 5 max_requests = 1000 # Recycle workers after 1000 requests max_requests_jitter = 50

# PHP-FPM tuning # Edit /etc/php/8.1/fpm/pool.d/www.conf pm = dynamic pm.max_children = 50 # Max concurrent requests pm.start_servers = 10 # Initial children pm.min_spare_servers = 5 # Min idle pm.max_spare_servers = 20 # Max idle pm.max_requests = 500 # Recycle after 500 requests

# For high-traffic sites, use static pm = static pm.max_children = 100 ```

### 4. Fix proxy buffer issues

Buffer configuration:

```nginx # Nginx proxy buffer configuration # In server or location block

location / { proxy_pass http://127.0.0.1:8000;

# Buffer sizes proxy_buffering on; proxy_buffer_size 4k; # First part of response proxy_buffers 8 16k; # Main buffer (8 x 16k = 128k) proxy_busy_buffers_size 24k; # While sending to client

# If upstream sends large headers: proxy_buffer_size 16k; # Increase first buffer

# Timeout settings proxy_connect_timeout 60s; proxy_send_timeout 60s; proxy_read_timeout 60s;

# Handle upstream errors proxy_next_upstream error timeout invalid_header http_502 http_503; proxy_next_upstream_tries 3; }

# For large responses (file downloads, APIs with big payloads) location /api/large { proxy_pass http://backend; proxy_buffering off; # Stream directly to client proxy_cache off; }

# Log buffer issues # If you see "upstream sent too big header" in error.log: # Increase proxy_buffer_size ```

FastCGI buffer configuration (PHP-FPM):

```nginx # FastCGI buffer configuration location ~ \.php$ { include fastcgi_params; fastcgi_pass unix:/run/php/php8.1-fpm.sock; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;

# Buffer settings fastcgi_buffering on; fastcgi_buffer_size 4k; fastcgi_buffers 8 16k; fastcgi_busy_buffers_size 24k;

# Timeout settings fastcgi_connect_timeout 60s; fastcgi_send_timeout 60s; fastcgi_read_timeout 60s;

# Handle large responses # If PHP outputs large data or many headers fastcgi_max_temp_file_size 1024m; fastcgi_temp_file_write_size 16k; }

# If "FastCGI sent too big header" errors: fastcgi_buffer_size 16k; fastcgi_buffers 16 16k; ```

### 5. Fix keepalive connection issues

Keepalive configuration:

```nginx # Upstream with keepalive upstream backend { server 127.0.0.1:8000; server 127.0.0.1:8001;

# Keepalive connections to upstream keepalive 32; # Max idle connections in pool }

server { location / { proxy_pass http://backend;

# Required for keepalive proxy_http_version 1.1; proxy_set_header Connection "";

# Other headers proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } }

# Upstream keepalive issues: # "upstream prematurely closed keepalive connection" # Fix: Ensure upstream supports keepalive and has adequate settings

# Gunicorn keepalive config gunicorn --keep-alive 5 app:app # Match or exceed Nginx keepalive

# In Nginx, keepalive should be <= upstream max connections ```

Upstream keepalive tuning:

```python # Gunicorn keepalive configuration # gunicorn.conf.py

keepalive = 5 # Seconds to wait for keepalive request # Must be >= Nginx's keepalive timeout

# Worker connections with keepalive # Formula: worker_connections >= keepalive + concurrent_requests

# For 32 keepalive connections + 100 concurrent: worker_connections = 132 ```

```php # PHP-FPM keepalive (PHP 7.4+) # In pool configuration pm.max_requests = 500 request_terminate_timeout = 60s

# PHP-FPM doesn't use keepalive the same way # Each request is independent # Focus on pm.max_children for capacity ```

### 6. Fix upstream timeout issues

Timeout alignment:

```nginx # Nginx timeouts should be SHORTER than upstream timeouts # This ensures Nginx times out first and can retry

upstream backend { server 127.0.0.1:8000; }

server { location / { proxy_pass http://backend;

# Nginx timeouts proxy_connect_timeout 5s; # Connection to upstream proxy_send_timeout 30s; # Sending request proxy_read_timeout 30s; # Waiting for response

# Next upstream on timeout proxy_next_upstream timeout; proxy_next_upstream_tries 2; } } ```

```python # Gunicorn timeout configuration # Should be LONGER than Nginx timeouts

timeout = 60 # Worker timeout (kill if request takes longer) keepalive = 5 graceful_timeout = 30 # Time to finish requests on reload

# Nginx: proxy_read_timeout 30s # Gunicorn: timeout 60s # Result: Nginx times out first, Gunicorn still processing ```

```php # PHP-FPM timeout configuration # In pool config (/etc/php/8.1/fpm/pool.d/www.conf)

request_terminate_timeout = 60s # Max request time

# In php.ini max_execution_time = 60 max_input_time = 60

# Nginx fastcgi_read_timeout should be < request_terminate_timeout ```

### 7. Fix SSL upstream issues

SSL between Nginx and upstream:

```nginx # Upstream with SSL upstream backend { server 127.0.0.1:8443; }

server { location / { proxy_pass https://backend;

# SSL verification proxy_ssl_verify on; proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt; proxy_ssl_verify_depth 2;

# Or disable verification (internal only, not recommended) # proxy_ssl_verify off;

# SSL session reuse proxy_ssl_session_reuse on;

# SSL timeouts proxy_ssl_handshake_timeout 10s; } }

# For self-signed certificates (internal use only) # proxy_ssl_verify off; # Or add to trusted CA ```

Debug SSL issues:

```bash # Test SSL connection to upstream openssl s_client -connect 127.0.0.1:8443 -servername localhost

# Check certificate openssl s_client -connect 127.0.0.1:8443 2>/dev/null | openssl x509 -noout -dates

# Check if upstream SSL is working curl -v https://127.0.0.1:8443/health

# If certificate errors: # - Use valid certificate # - Or add to trusted store # - Or disable verification (internal only) ```

### 8. Monitor and alert on 502 errors

Nginx log parsing:

```bash # Count 502 errors per minute awk '{print $4}' /var/log/nginx/access.log | cut -d: -f1,2 | uniq -c | tail -20

# Extract 502 requests with upstream info grep " 502 " /var/log/nginx/access.log | head -10

# Real-time 502 monitoring tail -f /var/log/nginx/access.log | grep --line-buffered " 502 "

# Check upstream health watch -n 5 'curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:8000/health' ```

Prometheus metrics:

```yaml # Nginx exporter metrics # https://github.com/nginxinc/nginx-prometheus-exporter

# Key metrics: nginx_http_requests_total{status="502"} # 502 count nginx_upstream_state # Upstream response state

# Install exporter docker run -d --name nginx-exporter \ -p 9113:9113 \ nginx/nginx-prometheus-exporter:latest \ -nginx.scrape-uri="http://127.0.0.1:8080/stub_status"

# Enable stub_status in Nginx server { listen 8080; location /stub_status { stub_status; allow 127.0.0.1; deny all; } } ```

Grafana alert rules:

```yaml groups: - name: nginx_502 rules: - alert: Nginx502RateHigh expr: | sum(rate(nginx_http_requests_total{status="502"}[5m])) / sum(rate(nginx_http_requests_total[5m])) > 0.05 for: 5m labels: severity: critical annotations: summary: "Nginx 502 error rate above 5%" description: "{{ $value | humanizePercentage }} of requests returning 502"

  • alert: NginxUpstreamDown
  • expr: nginx_upstream_server_state == 0
  • for: 2m
  • labels:
  • severity: critical
  • annotations:
  • summary: "Nginx upstream server is down"
  • description: "Upstream {{ $labels.upstream }} server {{ $labels.server }} is unhealthy"
  • alert: NginxUpstreamConnectionsFailed
  • expr: rate(nginx_upstream_connections_failed_total[5m]) > 10
  • for: 5m
  • labels:
  • severity: warning
  • annotations:
  • summary: "High upstream connection failures"
  • description: "{{ $value }} failed connections per second"
  • `

Health check configuration:

```nginx # Active health checks (Nginx Plus only) upstream backend { zone backend 64k;

server 127.0.0.1:8000; server 127.0.0.1:8001; server 127.0.0.1:8002;

# Health check configuration health_check interval=5s fails=3 passes=2 uri=/health; }

# Open source Nginx: passive health checks upstream backend { server 127.0.0.1:8000 max_fails=3 fail_timeout=30s; server 127.0.0.1:8001 max_fails=3 fail_timeout=30s; server 127.0.0.1:8002 max_fails=3 fail_timeout=30s; }

# max_fails: Mark down after N failures # fail_timeout: Time to consider down, and retry interval ```

Prevention

  • Monitor upstream health with active or passive checks
  • Set appropriate worker counts based on capacity testing
  • Align Nginx timeouts to be shorter than upstream timeouts
  • Use keepalive connections with proper sizing
  • Configure proxy buffers for expected response sizes
  • Implement circuit breakers for upstream failures
  • Set up alerting for 502 rate increases
  • Use multiple upstream instances for redundancy
  • Gracefully drain upstreams during deployments
  • Document runbooks for common 502 causes
  • **504 Gateway Timeout**: Upstream too slow, not responding
  • **503 Service Unavailable**: No upstream available, all down
  • **500 Internal Server Error**: Nginx configuration error
  • **501 Not Implemented**: Request method not supported
  • **505 HTTP Version Not Supported**: Protocol version mismatch