Fix Nginx 502 Bad Gateway Error - Complete Deep Dive Guide

Introduction

Nginx 502 Bad Gateway errors occur when Nginx, acting as a reverse proxy or load balancer, receives an invalid response from upstream servers or cannot establish a connection to them. Unlike 504 Gateway Timeout (which indicates the upstream is slow), 502 indicates the upstream actively returned a malformed response, closed the connection prematurely, or refused the connection entirely. Common causes include upstream server (Gunicorn, uWSGI, PHP-FPM, Node.js, etc.) crashed or not running, upstream process out of workers/connection slots, connection refused due to firewall or socket permissions, upstream returning invalid HTTP responses, keepalive connection closed by upstream, proxy buffer size exceeded, FastCGI parameter misconfiguration, and upstream SSL/TLS handshake failures. The fix requires understanding Nginx proxy architecture, upstream health checking, connection management, and proper error handling. This guide provides production-proven troubleshooting for Nginx 502 errors across PHP, Python, Node.js, and other backend configurations.

Symptoms

Browser displays 502 Bad Gateway error page
Nginx error log shows upstream prematurely closed connection
connect() failed (111: Connection refused) while connecting to upstream
upstream sent too big header errors in logs
no live upstreams while connecting to upstream
Application works intermittently, fails under load
502 errors correlate with high traffic spikes
recv() failed (104: Connection reset by peer)
PHP-FPM returns 502 for specific requests
WebSocket connections fail with 502
gRPC calls return 502 Bad Gateway
Health checks show upstream as unhealthy

Common Causes

Upstream service (Gunicorn, uWSGI, PHP-FPM) not running
Upstream worker exhaustion (all workers busy)
Upstream connection limit reached (max_connections)
Unix socket permissions wrong (nginx can't connect)
TCP port not listening or firewall blocking
Upstream crashed due to OOM or panic
Keepalive connections closed by upstream prematurely
Proxy buffer too small for upstream response headers
Upstream returned invalid HTTP response (malformed)
SSL handshake failure between Nginx and upstream
FastCGI parameter SCRIPT_FILENAME incorrect
Upstream timeout shorter than Nginx timeout
DNS resolution failure for upstream hostname
Upstream behind load balancer returning errors

Step-by-Step Fix

### 1. Diagnose 502 error source

Check Nginx error logs:

```bash # Tail Nginx error log tail -f /var/log/nginx/error.log

# Common 502 error patterns:

# Connection refused - upstream not listening # connect() failed (111: Connection refused) while connecting to upstream

# Connection reset - upstream closed unexpectedly # recv() failed (104: Connection reset by peer) while reading response header

# Premature close - upstream closed before response # upstream prematurely closed connection while reading response header

# No live upstreams - all backends failed # no live upstreams while connecting to upstream

# Invalid header from upstream # upstream sent too big header while reading response header

# Check access log for 502 patterns grep " 502 " /var/log/nginx/access.log | tail -20

# Analyze 502 rate over time awk '$9 == 502 {print $4}' /var/log/nginx/access.log | cut -d: -f1,2 | uniq -c ```

Check upstream service status:

```bash # Check if upstream is running (systemd services) systemctl status gunicorn systemctl status php-fpm systemctl status uwsgi systemctl status node-app

# Check listening ports ss -tlnp | grep -E "8000|8080|9000" netstat -tlnp | grep -E "8000|8080|9000"

# Check Unix sockets ls -la /var/run/gunicorn.sock ls -la /run/php-fpm/www.sock

# Test upstream directly (bypass Nginx) curl -v http://127.0.0.1:8000/health curl -v --unix-socket /var/run/gunicorn.sock http://localhost/health ```

### 2. Fix upstream connection refused

TCP connection issues:

```bash # Check if port is listening ss -tlnp | grep :8000

# If not listening, start upstream service systemctl start gunicorn

# Check firewall (upstream may be listening but blocked) ufw status iptables -L -n | grep 8000

# Allow local connections ufw allow from 127.0.0.1 to any port 8000

# Check if upstream bound to correct interface ss -tlnp | grep 8000 # 127.0.0.1:8000 - localhost only # 0.0.0.0:8000 - all interfaces # [::]:8000 - IPv6 all interfaces

# Fix upstream binding (example: Gunicorn) gunicorn --bind 127.0.0.1:8000 app:app ```

Unix socket permission issues:

```bash # Check socket exists and permissions ls -la /var/run/gunicorn.sock # srwxrwxrwx 1 www-data www-data - should be accessible by nginx user

# Socket owned by wrong user # Fix: Set correct ownership in upstream config

# Gunicorn socket configuration cat > /etc/systemd/system/gunicorn.service << 'EOF' [Service] User=www-data Group=www-data WorkingDirectory=/var/www/myapp ExecStart=/usr/bin/gunicorn --bind unix:/var/run/gunicorn.sock \ --chown-socket www-data:www-data \ --chmod-socket 666 \ app:app EOF

# Nginx user must be able to access socket # Add nginx to www-data group (alternative) usermod -aG www-data nginx

# PHP-FPM socket configuration cat > /etc/php/8.1/fpm/pool.d/www.conf << 'EOF' [www] user = www-data group = www-data listen = /run/php/php8.1-fpm.sock listen.owner = www-data listen.group = www-data listen.mode = 0666 EOF

# Restart services systemctl daemon-reload systemctl restart gunicorn systemctl restart php8.1-fpm ```

### 3. Fix upstream worker exhaustion

Monitor upstream capacity:

```bash # Check Gunicorn workers ps aux | grep gunicorn | grep -v grep

# Check worker utilization # In Gunicorn logs, look for: # "worker temporary failure" or "worker failed to boot"

# Check PHP-FPM process manager status # Enable status page in pool config: pm.status_path = /fpm-status

# Access via curl (through Nginx or direct) curl http://127.0.0.1:9000/fpm-status

# Output: # pool: www # process manager: dynamic # start time: 01/Apr/2026:10:00:00 +0000 # start since: 86400 # accepted conn: 125000 # listen queue: 0 # max listen queue: 50 # listen queue len: 128 # idle processes: 5 # active processes: 20 # total processes: 25 # max active processes: 35 # max children reached: 10 # If > 0, limit was hit!

# If max children reached > 0, increase limit ```

Tune upstream workers:

```bash # Gunicorn worker tuning # Formula: workers = (2 x CPU cores) + 1 # Or: workers = available_memory / worker_memory

# For 4-core server with 8GB RAM, ~200MB per worker: # workers = 8GB / 200MB = ~40 (but capped by CPU) # Use 9-15 workers typically

gunicorn --workers 9 \ --worker-class sync \ --threads 4 \ --worker-connections 1000 \ --timeout 30 \ app:app

# Gunicorn config file (gunicorn.conf.py) workers = 9 worker_class = 'sync' # Or 'gevent', 'eventlet' for async threads = 4 worker_connections = 1000 timeout = 30 keepalive = 5 max_requests = 1000 # Recycle workers after 1000 requests max_requests_jitter = 50

# PHP-FPM tuning # Edit /etc/php/8.1/fpm/pool.d/www.conf pm = dynamic pm.max_children = 50 # Max concurrent requests pm.start_servers = 10 # Initial children pm.min_spare_servers = 5 # Min idle pm.max_spare_servers = 20 # Max idle pm.max_requests = 500 # Recycle after 500 requests

# For high-traffic sites, use static pm = static pm.max_children = 100 ```

### 4. Fix proxy buffer issues

Buffer configuration:

```nginx # Nginx proxy buffer configuration # In server or location block

location / { proxy_pass http://127.0.0.1:8000;

# Buffer sizes proxy_buffering on; proxy_buffer_size 4k; # First part of response proxy_buffers 8 16k; # Main buffer (8 x 16k = 128k) proxy_busy_buffers_size 24k; # While sending to client

# If upstream sends large headers: proxy_buffer_size 16k; # Increase first buffer

# Timeout settings proxy_connect_timeout 60s; proxy_send_timeout 60s; proxy_read_timeout 60s;

# Handle upstream errors proxy_next_upstream error timeout invalid_header http_502 http_503; proxy_next_upstream_tries 3; }

# For large responses (file downloads, APIs with big payloads) location /api/large { proxy_pass http://backend; proxy_buffering off; # Stream directly to client proxy_cache off; }

# Log buffer issues # If you see "upstream sent too big header" in error.log: # Increase proxy_buffer_size ```

FastCGI buffer configuration (PHP-FPM):

```nginx # FastCGI buffer configuration location ~ \.php$ { include fastcgi_params; fastcgi_pass unix:/run/php/php8.1-fpm.sock; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;

# Buffer settings fastcgi_buffering on; fastcgi_buffer_size 4k; fastcgi_buffers 8 16k; fastcgi_busy_buffers_size 24k;

# Timeout settings fastcgi_connect_timeout 60s; fastcgi_send_timeout 60s; fastcgi_read_timeout 60s;

# Handle large responses # If PHP outputs large data or many headers fastcgi_max_temp_file_size 1024m; fastcgi_temp_file_write_size 16k; }

# If "FastCGI sent too big header" errors: fastcgi_buffer_size 16k; fastcgi_buffers 16 16k; ```

### 5. Fix keepalive connection issues

Keepalive configuration:

```nginx # Upstream with keepalive upstream backend { server 127.0.0.1:8000; server 127.0.0.1:8001;

# Keepalive connections to upstream keepalive 32; # Max idle connections in pool }

server { location / { proxy_pass http://backend;

# Required for keepalive proxy_http_version 1.1; proxy_set_header Connection "";

# Other headers proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } }

# Upstream keepalive issues: # "upstream prematurely closed keepalive connection" # Fix: Ensure upstream supports keepalive and has adequate settings

# Gunicorn keepalive config gunicorn --keep-alive 5 app:app # Match or exceed Nginx keepalive

# In Nginx, keepalive should be <= upstream max connections ```

Upstream keepalive tuning:

```python # Gunicorn keepalive configuration # gunicorn.conf.py

keepalive = 5 # Seconds to wait for keepalive request # Must be >= Nginx's keepalive timeout

# Worker connections with keepalive # Formula: worker_connections >= keepalive + concurrent_requests

# For 32 keepalive connections + 100 concurrent: worker_connections = 132 ```

```php # PHP-FPM keepalive (PHP 7.4+) # In pool configuration pm.max_requests = 500 request_terminate_timeout = 60s

# PHP-FPM doesn't use keepalive the same way # Each request is independent # Focus on pm.max_children for capacity ```

### 6. Fix upstream timeout issues

Timeout alignment:

```nginx # Nginx timeouts should be SHORTER than upstream timeouts # This ensures Nginx times out first and can retry

upstream backend { server 127.0.0.1:8000; }

server { location / { proxy_pass http://backend;

# Nginx timeouts proxy_connect_timeout 5s; # Connection to upstream proxy_send_timeout 30s; # Sending request proxy_read_timeout 30s; # Waiting for response

# Next upstream on timeout proxy_next_upstream timeout; proxy_next_upstream_tries 2; } } ```

```python # Gunicorn timeout configuration # Should be LONGER than Nginx timeouts

timeout = 60 # Worker timeout (kill if request takes longer) keepalive = 5 graceful_timeout = 30 # Time to finish requests on reload

# Nginx: proxy_read_timeout 30s # Gunicorn: timeout 60s # Result: Nginx times out first, Gunicorn still processing ```

```php # PHP-FPM timeout configuration # In pool config (/etc/php/8.1/fpm/pool.d/www.conf)

request_terminate_timeout = 60s # Max request time

# In php.ini max_execution_time = 60 max_input_time = 60

# Nginx fastcgi_read_timeout should be < request_terminate_timeout ```

### 7. Fix SSL upstream issues

SSL between Nginx and upstream:

```nginx # Upstream with SSL upstream backend { server 127.0.0.1:8443; }

server { location / { proxy_pass https://backend;

# SSL verification proxy_ssl_verify on; proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt; proxy_ssl_verify_depth 2;

# Or disable verification (internal only, not recommended) # proxy_ssl_verify off;

# SSL session reuse proxy_ssl_session_reuse on;

# SSL timeouts proxy_ssl_handshake_timeout 10s; } }

# For self-signed certificates (internal use only) # proxy_ssl_verify off; # Or add to trusted CA ```

Debug SSL issues:

```bash # Test SSL connection to upstream openssl s_client -connect 127.0.0.1:8443 -servername localhost

# Check certificate openssl s_client -connect 127.0.0.1:8443 2>/dev/null | openssl x509 -noout -dates

# Check if upstream SSL is working curl -v https://127.0.0.1:8443/health

# If certificate errors: # - Use valid certificate # - Or add to trusted store # - Or disable verification (internal only) ```

### 8. Monitor and alert on 502 errors

Nginx log parsing:

```bash # Count 502 errors per minute awk '{print $4}' /var/log/nginx/access.log | cut -d: -f1,2 | uniq -c | tail -20

# Extract 502 requests with upstream info grep " 502 " /var/log/nginx/access.log | head -10

# Real-time 502 monitoring tail -f /var/log/nginx/access.log | grep --line-buffered " 502 "

# Check upstream health watch -n 5 'curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:8000/health' ```

Prometheus metrics:

```yaml # Nginx exporter metrics # https://github.com/nginxinc/nginx-prometheus-exporter

# Key metrics: nginx_http_requests_total{status="502"} # 502 count nginx_upstream_state # Upstream response state

# Install exporter docker run -d --name nginx-exporter \ -p 9113:9113 \ nginx/nginx-prometheus-exporter:latest \ -nginx.scrape-uri="http://127.0.0.1:8080/stub_status"

# Enable stub_status in Nginx server { listen 8080; location /stub_status { stub_status; allow 127.0.0.1; deny all; } } ```

Grafana alert rules:

```yaml groups: - name: nginx_502 rules: - alert: Nginx502RateHigh expr: | sum(rate(nginx_http_requests_total{status="502"}[5m])) / sum(rate(nginx_http_requests_total[5m])) > 0.05 for: 5m labels: severity: critical annotations: summary: "Nginx 502 error rate above 5%" description: "{{ $value | humanizePercentage }} of requests returning 502"

alert: NginxUpstreamDown
expr: nginx_upstream_server_state == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Nginx upstream server is down"
description: "Upstream {{ $labels.upstream }} server {{ $labels.server }} is unhealthy"

alert: NginxUpstreamConnectionsFailed
expr: rate(nginx_upstream_connections_failed_total[5m]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "High upstream connection failures"
description: "{{ $value }} failed connections per second"
`

Health check configuration:

```nginx # Active health checks (Nginx Plus only) upstream backend { zone backend 64k;

server 127.0.0.1:8000; server 127.0.0.1:8001; server 127.0.0.1:8002;

# Health check configuration health_check interval=5s fails=3 passes=2 uri=/health; }

# Open source Nginx: passive health checks upstream backend { server 127.0.0.1:8000 max_fails=3 fail_timeout=30s; server 127.0.0.1:8001 max_fails=3 fail_timeout=30s; server 127.0.0.1:8002 max_fails=3 fail_timeout=30s; }

# max_fails: Mark down after N failures # fail_timeout: Time to consider down, and retry interval ```

Prevention

Monitor upstream health with active or passive checks
Set appropriate worker counts based on capacity testing
Align Nginx timeouts to be shorter than upstream timeouts
Use keepalive connections with proper sizing
Configure proxy buffers for expected response sizes
Implement circuit breakers for upstream failures
Set up alerting for 502 rate increases
Use multiple upstream instances for redundancy
Gracefully drain upstreams during deployments
Document runbooks for common 502 causes

**504 Gateway Timeout**: Upstream too slow, not responding
**503 Service Unavailable**: No upstream available, all down
**500 Internal Server Error**: Nginx configuration error
**501 Not Implemented**: Request method not supported
**505 HTTP Version Not Supported**: Protocol version mismatch

How to Fix Nginx 502 Bad Gateway Error - Complete Troubleshooting Guide

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Prevention

Related Errors

Share this guide