Introduction
Nginx 502 Bad Gateway errors occur when Nginx, acting as a reverse proxy or load balancer, receives an invalid response from upstream servers or cannot establish a connection to them. Unlike 504 Gateway Timeout (which indicates the upstream is slow), 502 indicates the upstream actively returned a malformed response, closed the connection prematurely, or refused the connection entirely. Common causes include upstream server (Gunicorn, uWSGI, PHP-FPM, Node.js, etc.) crashed or not running, upstream process out of workers/connection slots, connection refused due to firewall or socket permissions, upstream returning invalid HTTP responses, keepalive connection closed by upstream, proxy buffer size exceeded, FastCGI parameter misconfiguration, and upstream SSL/TLS handshake failures. The fix requires understanding Nginx proxy architecture, upstream health checking, connection management, and proper error handling. This guide provides production-proven troubleshooting for Nginx 502 errors across PHP, Python, Node.js, and other backend configurations.
Symptoms
- Browser displays
502 Bad Gatewayerror page - Nginx error log shows
upstream prematurely closed connection connect() failed (111: Connection refused) while connecting to upstreamupstream sent too big headererrors in logsno live upstreams while connecting to upstream- Application works intermittently, fails under load
- 502 errors correlate with high traffic spikes
recv() failed (104: Connection reset by peer)- PHP-FPM returns 502 for specific requests
- WebSocket connections fail with 502
- gRPC calls return 502 Bad Gateway
- Health checks show upstream as unhealthy
Common Causes
- Upstream service (Gunicorn, uWSGI, PHP-FPM) not running
- Upstream worker exhaustion (all workers busy)
- Upstream connection limit reached (max_connections)
- Unix socket permissions wrong (nginx can't connect)
- TCP port not listening or firewall blocking
- Upstream crashed due to OOM or panic
- Keepalive connections closed by upstream prematurely
- Proxy buffer too small for upstream response headers
- Upstream returned invalid HTTP response (malformed)
- SSL handshake failure between Nginx and upstream
- FastCGI parameter
SCRIPT_FILENAMEincorrect - Upstream timeout shorter than Nginx timeout
- DNS resolution failure for upstream hostname
- Upstream behind load balancer returning errors
Step-by-Step Fix
### 1. Diagnose 502 error source
Check Nginx error logs:
```bash # Tail Nginx error log tail -f /var/log/nginx/error.log
# Common 502 error patterns:
# Connection refused - upstream not listening # connect() failed (111: Connection refused) while connecting to upstream
# Connection reset - upstream closed unexpectedly # recv() failed (104: Connection reset by peer) while reading response header
# Premature close - upstream closed before response # upstream prematurely closed connection while reading response header
# No live upstreams - all backends failed # no live upstreams while connecting to upstream
# Invalid header from upstream # upstream sent too big header while reading response header
# Check access log for 502 patterns grep " 502 " /var/log/nginx/access.log | tail -20
# Analyze 502 rate over time awk '$9 == 502 {print $4}' /var/log/nginx/access.log | cut -d: -f1,2 | uniq -c ```
Check upstream service status:
```bash # Check if upstream is running (systemd services) systemctl status gunicorn systemctl status php-fpm systemctl status uwsgi systemctl status node-app
# Check listening ports ss -tlnp | grep -E "8000|8080|9000" netstat -tlnp | grep -E "8000|8080|9000"
# Check Unix sockets ls -la /var/run/gunicorn.sock ls -la /run/php-fpm/www.sock
# Test upstream directly (bypass Nginx) curl -v http://127.0.0.1:8000/health curl -v --unix-socket /var/run/gunicorn.sock http://localhost/health ```
### 2. Fix upstream connection refused
TCP connection issues:
```bash # Check if port is listening ss -tlnp | grep :8000
# If not listening, start upstream service systemctl start gunicorn
# Check firewall (upstream may be listening but blocked) ufw status iptables -L -n | grep 8000
# Allow local connections ufw allow from 127.0.0.1 to any port 8000
# Check if upstream bound to correct interface ss -tlnp | grep 8000 # 127.0.0.1:8000 - localhost only # 0.0.0.0:8000 - all interfaces # [::]:8000 - IPv6 all interfaces
# Fix upstream binding (example: Gunicorn) gunicorn --bind 127.0.0.1:8000 app:app ```
Unix socket permission issues:
```bash # Check socket exists and permissions ls -la /var/run/gunicorn.sock # srwxrwxrwx 1 www-data www-data - should be accessible by nginx user
# Socket owned by wrong user # Fix: Set correct ownership in upstream config
# Gunicorn socket configuration cat > /etc/systemd/system/gunicorn.service << 'EOF' [Service] User=www-data Group=www-data WorkingDirectory=/var/www/myapp ExecStart=/usr/bin/gunicorn --bind unix:/var/run/gunicorn.sock \ --chown-socket www-data:www-data \ --chmod-socket 666 \ app:app EOF
# Nginx user must be able to access socket # Add nginx to www-data group (alternative) usermod -aG www-data nginx
# PHP-FPM socket configuration cat > /etc/php/8.1/fpm/pool.d/www.conf << 'EOF' [www] user = www-data group = www-data listen = /run/php/php8.1-fpm.sock listen.owner = www-data listen.group = www-data listen.mode = 0666 EOF
# Restart services systemctl daemon-reload systemctl restart gunicorn systemctl restart php8.1-fpm ```
### 3. Fix upstream worker exhaustion
Monitor upstream capacity:
```bash # Check Gunicorn workers ps aux | grep gunicorn | grep -v grep
# Check worker utilization # In Gunicorn logs, look for: # "worker temporary failure" or "worker failed to boot"
# Check PHP-FPM process manager status # Enable status page in pool config: pm.status_path = /fpm-status
# Access via curl (through Nginx or direct) curl http://127.0.0.1:9000/fpm-status
# Output: # pool: www # process manager: dynamic # start time: 01/Apr/2026:10:00:00 +0000 # start since: 86400 # accepted conn: 125000 # listen queue: 0 # max listen queue: 50 # listen queue len: 128 # idle processes: 5 # active processes: 20 # total processes: 25 # max active processes: 35 # max children reached: 10 # If > 0, limit was hit!
# If max children reached > 0, increase limit ```
Tune upstream workers:
```bash # Gunicorn worker tuning # Formula: workers = (2 x CPU cores) + 1 # Or: workers = available_memory / worker_memory
# For 4-core server with 8GB RAM, ~200MB per worker: # workers = 8GB / 200MB = ~40 (but capped by CPU) # Use 9-15 workers typically
gunicorn --workers 9 \ --worker-class sync \ --threads 4 \ --worker-connections 1000 \ --timeout 30 \ app:app
# Gunicorn config file (gunicorn.conf.py) workers = 9 worker_class = 'sync' # Or 'gevent', 'eventlet' for async threads = 4 worker_connections = 1000 timeout = 30 keepalive = 5 max_requests = 1000 # Recycle workers after 1000 requests max_requests_jitter = 50
# PHP-FPM tuning # Edit /etc/php/8.1/fpm/pool.d/www.conf pm = dynamic pm.max_children = 50 # Max concurrent requests pm.start_servers = 10 # Initial children pm.min_spare_servers = 5 # Min idle pm.max_spare_servers = 20 # Max idle pm.max_requests = 500 # Recycle after 500 requests
# For high-traffic sites, use static pm = static pm.max_children = 100 ```
### 4. Fix proxy buffer issues
Buffer configuration:
```nginx # Nginx proxy buffer configuration # In server or location block
location / { proxy_pass http://127.0.0.1:8000;
# Buffer sizes proxy_buffering on; proxy_buffer_size 4k; # First part of response proxy_buffers 8 16k; # Main buffer (8 x 16k = 128k) proxy_busy_buffers_size 24k; # While sending to client
# If upstream sends large headers: proxy_buffer_size 16k; # Increase first buffer
# Timeout settings proxy_connect_timeout 60s; proxy_send_timeout 60s; proxy_read_timeout 60s;
# Handle upstream errors proxy_next_upstream error timeout invalid_header http_502 http_503; proxy_next_upstream_tries 3; }
# For large responses (file downloads, APIs with big payloads) location /api/large { proxy_pass http://backend; proxy_buffering off; # Stream directly to client proxy_cache off; }
# Log buffer issues # If you see "upstream sent too big header" in error.log: # Increase proxy_buffer_size ```
FastCGI buffer configuration (PHP-FPM):
```nginx # FastCGI buffer configuration location ~ \.php$ { include fastcgi_params; fastcgi_pass unix:/run/php/php8.1-fpm.sock; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
# Buffer settings fastcgi_buffering on; fastcgi_buffer_size 4k; fastcgi_buffers 8 16k; fastcgi_busy_buffers_size 24k;
# Timeout settings fastcgi_connect_timeout 60s; fastcgi_send_timeout 60s; fastcgi_read_timeout 60s;
# Handle large responses # If PHP outputs large data or many headers fastcgi_max_temp_file_size 1024m; fastcgi_temp_file_write_size 16k; }
# If "FastCGI sent too big header" errors: fastcgi_buffer_size 16k; fastcgi_buffers 16 16k; ```
### 5. Fix keepalive connection issues
Keepalive configuration:
```nginx # Upstream with keepalive upstream backend { server 127.0.0.1:8000; server 127.0.0.1:8001;
# Keepalive connections to upstream keepalive 32; # Max idle connections in pool }
server { location / { proxy_pass http://backend;
# Required for keepalive proxy_http_version 1.1; proxy_set_header Connection "";
# Other headers proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } }
# Upstream keepalive issues: # "upstream prematurely closed keepalive connection" # Fix: Ensure upstream supports keepalive and has adequate settings
# Gunicorn keepalive config gunicorn --keep-alive 5 app:app # Match or exceed Nginx keepalive
# In Nginx, keepalive should be <= upstream max connections ```
Upstream keepalive tuning:
```python # Gunicorn keepalive configuration # gunicorn.conf.py
keepalive = 5 # Seconds to wait for keepalive request # Must be >= Nginx's keepalive timeout
# Worker connections with keepalive # Formula: worker_connections >= keepalive + concurrent_requests
# For 32 keepalive connections + 100 concurrent: worker_connections = 132 ```
```php # PHP-FPM keepalive (PHP 7.4+) # In pool configuration pm.max_requests = 500 request_terminate_timeout = 60s
# PHP-FPM doesn't use keepalive the same way # Each request is independent # Focus on pm.max_children for capacity ```
### 6. Fix upstream timeout issues
Timeout alignment:
```nginx # Nginx timeouts should be SHORTER than upstream timeouts # This ensures Nginx times out first and can retry
upstream backend { server 127.0.0.1:8000; }
server { location / { proxy_pass http://backend;
# Nginx timeouts proxy_connect_timeout 5s; # Connection to upstream proxy_send_timeout 30s; # Sending request proxy_read_timeout 30s; # Waiting for response
# Next upstream on timeout proxy_next_upstream timeout; proxy_next_upstream_tries 2; } } ```
```python # Gunicorn timeout configuration # Should be LONGER than Nginx timeouts
timeout = 60 # Worker timeout (kill if request takes longer) keepalive = 5 graceful_timeout = 30 # Time to finish requests on reload
# Nginx: proxy_read_timeout 30s # Gunicorn: timeout 60s # Result: Nginx times out first, Gunicorn still processing ```
```php # PHP-FPM timeout configuration # In pool config (/etc/php/8.1/fpm/pool.d/www.conf)
request_terminate_timeout = 60s # Max request time
# In php.ini max_execution_time = 60 max_input_time = 60
# Nginx fastcgi_read_timeout should be < request_terminate_timeout ```
### 7. Fix SSL upstream issues
SSL between Nginx and upstream:
```nginx # Upstream with SSL upstream backend { server 127.0.0.1:8443; }
server { location / { proxy_pass https://backend;
# SSL verification proxy_ssl_verify on; proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt; proxy_ssl_verify_depth 2;
# Or disable verification (internal only, not recommended) # proxy_ssl_verify off;
# SSL session reuse proxy_ssl_session_reuse on;
# SSL timeouts proxy_ssl_handshake_timeout 10s; } }
# For self-signed certificates (internal use only) # proxy_ssl_verify off; # Or add to trusted CA ```
Debug SSL issues:
```bash # Test SSL connection to upstream openssl s_client -connect 127.0.0.1:8443 -servername localhost
# Check certificate openssl s_client -connect 127.0.0.1:8443 2>/dev/null | openssl x509 -noout -dates
# Check if upstream SSL is working curl -v https://127.0.0.1:8443/health
# If certificate errors: # - Use valid certificate # - Or add to trusted store # - Or disable verification (internal only) ```
### 8. Monitor and alert on 502 errors
Nginx log parsing:
```bash # Count 502 errors per minute awk '{print $4}' /var/log/nginx/access.log | cut -d: -f1,2 | uniq -c | tail -20
# Extract 502 requests with upstream info grep " 502 " /var/log/nginx/access.log | head -10
# Real-time 502 monitoring tail -f /var/log/nginx/access.log | grep --line-buffered " 502 "
# Check upstream health watch -n 5 'curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:8000/health' ```
Prometheus metrics:
```yaml # Nginx exporter metrics # https://github.com/nginxinc/nginx-prometheus-exporter
# Key metrics: nginx_http_requests_total{status="502"} # 502 count nginx_upstream_state # Upstream response state
# Install exporter docker run -d --name nginx-exporter \ -p 9113:9113 \ nginx/nginx-prometheus-exporter:latest \ -nginx.scrape-uri="http://127.0.0.1:8080/stub_status"
# Enable stub_status in Nginx server { listen 8080; location /stub_status { stub_status; allow 127.0.0.1; deny all; } } ```
Grafana alert rules:
```yaml groups: - name: nginx_502 rules: - alert: Nginx502RateHigh expr: | sum(rate(nginx_http_requests_total{status="502"}[5m])) / sum(rate(nginx_http_requests_total[5m])) > 0.05 for: 5m labels: severity: critical annotations: summary: "Nginx 502 error rate above 5%" description: "{{ $value | humanizePercentage }} of requests returning 502"
- alert: NginxUpstreamDown
- expr: nginx_upstream_server_state == 0
- for: 2m
- labels:
- severity: critical
- annotations:
- summary: "Nginx upstream server is down"
- description: "Upstream {{ $labels.upstream }} server {{ $labels.server }} is unhealthy"
- alert: NginxUpstreamConnectionsFailed
- expr: rate(nginx_upstream_connections_failed_total[5m]) > 10
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "High upstream connection failures"
- description: "{{ $value }} failed connections per second"
`
Health check configuration:
```nginx # Active health checks (Nginx Plus only) upstream backend { zone backend 64k;
server 127.0.0.1:8000; server 127.0.0.1:8001; server 127.0.0.1:8002;
# Health check configuration health_check interval=5s fails=3 passes=2 uri=/health; }
# Open source Nginx: passive health checks upstream backend { server 127.0.0.1:8000 max_fails=3 fail_timeout=30s; server 127.0.0.1:8001 max_fails=3 fail_timeout=30s; server 127.0.0.1:8002 max_fails=3 fail_timeout=30s; }
# max_fails: Mark down after N failures # fail_timeout: Time to consider down, and retry interval ```
Prevention
- Monitor upstream health with active or passive checks
- Set appropriate worker counts based on capacity testing
- Align Nginx timeouts to be shorter than upstream timeouts
- Use keepalive connections with proper sizing
- Configure proxy buffers for expected response sizes
- Implement circuit breakers for upstream failures
- Set up alerting for 502 rate increases
- Use multiple upstream instances for redundancy
- Gracefully drain upstreams during deployments
- Document runbooks for common 502 causes
Related Errors
- **504 Gateway Timeout**: Upstream too slow, not responding
- **503 Service Unavailable**: No upstream available, all down
- **500 Internal Server Error**: Nginx configuration error
- **501 Not Implemented**: Request method not supported
- **505 HTTP Version Not Supported**: Protocol version mismatch