Introduction

Nginx 502 Bad Gateway errors occur when Nginx cannot establish a connection to the upstream server or receives an invalid response. The error indicates a connectivity issue between Nginx and the backend (application server, FastCGI process, or proxy), not a client-side problem. Common causes include upstream server down, connection refused due to backlog, keepalive connection limits exceeded, worker connection exhaustion, or FastCGI socket permission issues. The error appears in Nginx error logs as connect() failed, upstream prematurely closed connection, or no live upstreams.

Symptoms

  • HTTP 502 Bad Gateway response to clients
  • Nginx error log shows connect() failed (111: Connection refused) or Connection reset by peer
  • Issue intermittent during normal operation, constant during outages
  • Error rate increases during traffic spikes
  • Upstream server healthy but Nginx cannot connect
  • Issue appears after deploy, configuration change, or upstream restart

Common Causes

  • Upstream server (PHP-FPM, Gunicorn, Node.js) not running or crashed
  • Upstream backlog queue full (listen backlog exceeded)
  • Nginx worker_connections limit exhausted
  • Keepalive connections to upstream failing after upstream restart
  • FastCGI socket permissions incorrect or socket path changed
  • Upstream timeout too short for slow requests
  • Upstream server binding to wrong address (localhost vs 0.0.0.0)

Step-by-Step Fix

### 1. Check Nginx error logs for upstream details

Nginx error logs reveal the specific failure reason:

```bash # Tail Nginx error log tail -f /var/log/nginx/error.log

# Common 502 error messages: # connect() failed (111: Connection refused) while connecting to upstream # upstream prematurely closed connection while reading response header # no live upstreams while connecting to upstream # connect() failed (113: No route to host) while connecting to upstream

# Filter for 502-related errors grep -E "502|connect\(\)|upstream|Connection" /var/log/nginx/error.log | tail -50

# Check error log level for more details # Add to nginx.conf if not present: # error_log /var/log/nginx/error.log debug; ```

Error code meanings: - 111: Connection refused: Upstream not listening on expected port/socket - 113: No route to host: Firewall or routing issue - 104: Connection reset by peer: Upstream closed connection unexpectedly - 110: Connection timed out: Upstream too slow or network issue

### 2. Verify upstream server is running

Check that the backend service is active:

```bash # For PHP-FPM sudo systemctl status php-fpm sudo systemctl status php8.1-fpm # Version-specific

# For Gunicorn sudo systemctl status myapp ps aux | grep gunicorn

# For Node.js/PM2 pm2 status pm2 logs

# For uWSGI sudo systemctl status uwsgi ps aux | grep uwsgi

# Check listening ports sudo netstat -tlnp | grep :9000 # PHP-FPM default sudo netstat -tlnp | grep :8000 # Common Gunicorn port sudo netstat -tlnp | grep :3000 # Common Node.js port

# Or use ss (faster alternative) ss -tlnp | grep -E "9000|8000|3000" ```

If service not running:

```bash # Start service sudo systemctl start php-fpm sudo systemctl start myapp

# Enable on boot sudo systemctl enable php-fpm sudo systemctl enable myapp

# Check why it failed sudo journalctl -u php-fpm -n 50 --no-pager ```

### 3. Check upstream connection configuration

Verify Nginx upstream block matches backend configuration:

```nginx # /etc/nginx/conf.d/upstream.conf

# TCP upstream (application server) upstream backend { server 127.0.0.1:8000; server 127.0.0.1:8001 backup;

# Keepalive connections keepalive 32; keepalive_timeout 60s; keepalive_requests 1000; }

# FastCGI upstream (PHP-FPM) # Unix socket (preferred for local) upstream php-fpm { server unix:/run/php/php8.1-fpm.sock; keepalive 8; }

# Or TCP connection upstream php-fpm { server 127.0.0.1:9000; keepalive 8; } ```

Server block configuration:

```nginx server { listen 80; server_name example.com;

# Proxy to application server location / { proxy_pass http://backend; proxy_http_version 1.1; proxy_set_header Connection ""; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

# Timeouts proxy_connect_timeout 60s; proxy_send_timeout 60s; proxy_read_timeout 60s; }

# FastCGI for PHP location ~ \.php$ { include fastcgi_params; fastcgi_pass php-fpm; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; fastcgi_connect_timeout 60s; fastcgi_send_timeout 60s; fastcgi_read_timeout 60s; } } ```

### 4. Increase worker_connections limit

Nginx worker exhaustion causes connection failures:

```bash # Check current limits nginx -T | grep worker_connections

# Check active connections curl http://localhost/nginx_status # If stub_status enabled

# Or check with netstat netstat -an | grep :80 | wc -l ```

Increase worker connections:

```nginx # /etc/nginx/nginx.conf events { worker_connections 4096; # Default often 1024, increase for high traffic use epoll; # Linux optimal multi_accept on; # Accept all new connections at once }

# Calculate max clients: # max_clients = worker_processes * worker_connections # Example: 4 workers * 4096 = 16384 concurrent connections ```

Check system file descriptor limits:

```bash # Check current limits ulimit -n

# Check Nginx process limits cat /proc/$(cat /run/nginx.pid)/limits | grep "open files"

# Increase system-wide # /etc/security/limits.conf nginx soft nofile 65535 nginx hard nofile 65535

# Or in systemd service # /etc/systemd/system/nginx.service.d/override.conf [Service] LimitNOFILE=65535

sudo systemctl daemon-reload sudo systemctl restart nginx ```

### 5. Fix keepalive connection issues

Keepalive connections can fail after upstream restart:

```nginx # WRONG: Keepalive without proper headers upstream backend { server 127.0.0.1:8000; keepalive 32; }

location / { proxy_pass http://backend; # Missing: proxy_http_version and Connection header }

# CORRECT: Full keepalive configuration upstream backend { server 127.0.0.1:8000; keepalive 32; keepalive_timeout 60s; keepalive_requests 1000; }

location / { proxy_pass http://backend; proxy_http_version 1.1; proxy_set_header Connection ""; proxy_set_header Host $host; } ```

For FastCGI:

```nginx # PHP-FPM with keepalive upstream php-fpm { server unix:/run/php/php8.1-fpm.sock; keepalive 8; }

location ~ \.php$ { include fastcgi_params; fastcgi_pass php-fpm; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;

# Required for keepalive fastcgi_keep_conn on; } ```

### 6. Check FastCGI socket permissions

Unix socket permission issues cause connection refused:

```bash # Check socket exists and permissions ls -la /run/php/

# Expected: # srw-rw---- 1 www-data www-data 0 Mar 30 12:00 php8.1-fpm.sock

# Socket must be readable/writable by Nginx user # Nginx typically runs as www-data or nginx

# Check Nginx user ps aux | grep nginx | head -1

# Fix permissions in PHP-FPM pool config # /etc/php/8.1/fpm/pool.d/www.conf listen.owner = www-data listen.group = www-data listen.mode = 0660

# Or use TCP instead of socket # /etc/php/8.1/fpm/pool.d/www.conf listen = 127.0.0.1:9000

# Restart PHP-FPM sudo systemctl restart php8.1-fpm ```

### 7. Increase upstream backlog queue

When connection queue is full, new connections refused:

```bash # Check current backlog ss -tlnp | grep :9000 # Output: LISTEN 128 127.0.0.1:9000 # 128 = backlog queue size

# Check for dropped connections netstat -s | grep -i listen # Look for: "times the listen queue of a socket overflowed" # And: "SYNs to LISTEN sockets dropped" ```

Increase backlog in application:

```python # Gunicorn gunicorn --bind 127.0.0.1:8000 --backlog 2048 myapp:app

# Or in config # gunicorn.conf.py bind = "127.0.0.1:8000" backlog = 2048 ```

ini ; PHP-FPM ; /etc/php/8.1/fpm/pool.d/www.conf listen.backlog = 2048

Increase system limit:

```bash # Check current limit sysctl net.core.somaxconn

# Increase temporarily sysctl -w net.core.somaxconn=2048

# Increase permanently echo "net.core.somaxconn = 2048" >> /etc/sysctl.conf sysctl -p ```

### 8. Configure upstream health checks

Remove unhealthy upstreams from rotation:

```nginx # Passive health checks (built-in) upstream backend { server backend1.example.com; server backend2.example.com; server backend3.example.com;

# Mark server as failed after 5 consecutive failures # Remove from rotation for 30 seconds }

location / { proxy_pass http://backend; proxy_next_upstream error timeout http_502 http_503 http_504; proxy_next_upstream_tries 3; proxy_next_upstream_timeout 30s; } ```

Active health checks (Nginx Plus only):

```nginx upstream backend { zone backend_zone 64k; server backend1.example.com; server backend2.example.com;

health_check interval=10s fails=3 passes=2; } ```

Open source alternative with lua-resty-upstream-healthcheck:

```lua -- healthcheck.lua local healthcheck = require "resty.upstream.healthcheck"

healthcheck.spawn_checker({ shm = "healthcheck", upstream = "backend", type = "http", http_req = "GET /health HTTP/1.1\r\nHost: localhost\r\n\r\n", interval = 1000, -- 1 second fall = 3, rise = 2, }) ```

### 9. Check for upstream timeout issues

Slow requests may timeout before response:

```nginx # Increase timeouts location / { proxy_pass http://backend;

# Connection establishment proxy_connect_timeout 60s;

# Between client and proxy (upload) proxy_send_timeout 60s;

# Between proxy and upstream (download) proxy_read_timeout 120s; # Increase for slow endpoints }

# For FastCGI location ~ \.php$ { fastcgi_pass php-fpm; fastcgi_connect_timeout 60s; fastcgi_send_timeout 60s; fastcgi_read_timeout 120s; } ```

Check upstream application timeout settings:

python # Flask/Gunicorn # gunicorn.conf.py timeout = 120 # Worker timeout keepalive = 5 # Keepalive timeout

ini ; PHP-FPM ; /etc/php/8.1/fpm/pool.d/www.conf request_terminate_timeout = 120

### 10. Enable upstream monitoring and alerting

Set up monitoring for early detection:

```nginx # Enable stub_status for monitoring server { listen 127.0.0.1:8080;

location /nginx_status { stub_status; allow 127.0.0.1; deny all; } }

# Parse status curl http://127.0.0.1:8080/nginx_status # Output: # Active connections: 50 # server accepts handled requests # 100000 100000 500000 # Reading: 5 Writing: 40 Waiting: 5 ```

Prometheus metrics with nginx-prometheus-exporter:

yaml # docker-compose.yml nginx-exporter: image: nginx/nginx-prometheus-exporter command: - -nginx.scrape-uri=http://127.0.0.1:8080/nginx_status ports: - "9113:9113"

Key metrics to alert: - nginx_connections_waiting > threshold - nginx_http_requests_total rate increase - nginx_upstream_state non-200 responses

Prevention

  • Monitor upstream health with active checks
  • Set appropriate timeouts for application patterns
  • Use Unix sockets for local upstream (lower latency)
  • Configure keepalive connections with proper headers
  • Set worker_connections based on expected traffic
  • Implement circuit breaker pattern in application
  • Enable access logging with upstream response time

```nginx # Log format with upstream timing log_format upstream '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent" ' 'upstream: $upstream_addr ' 'connect_time: $upstream_connect_time ' 'response_time: $upstream_response_time';

access_log /var/log/nginx/access.log upstream; ```

  • **504 Gateway Timeout**: Upstream responded too slowly
  • **503 Service Unavailable**: No upstream available or maintenance mode
  • **500 Internal Server Error**: Upstream returned invalid response