Introduction

Nginx 504 Gateway Timeout errors occur when Nginx (acting as a reverse proxy) does not receive a timely response from the upstream server within the configured timeout period. The error indicates that Nginx successfully connected to the backend (unlike 502 Bad Gateway), but the backend took too long to respond. This commonly affects PHP-FPM, application servers (Node.js, Java, Python), and database-backed applications. The fix requires systematic diagnosis across Nginx proxy configuration, upstream server performance, network latency, resource constraints, and application-level bottlenecks. This guide provides production-proven troubleshooting for Nginx 504 scenarios including timeout tuning, proxy buffering optimization, FastCGI configuration, backend performance analysis, and monitoring strategies.

Symptoms

  • Browser displays 504 Gateway Timeout error page
  • Nginx error log shows upstream timed out (110: Connection timed out)
  • Application logs show requests completing successfully but users see errors
  • 504 errors occur intermittently under load or for specific endpoints
  • Long-running requests (reports, exports, batch operations) consistently timeout
  • Nginx access log shows $upstream_response_time exceeding timeout threshold
  • Backend server shows requests eventually completing after Nginx has timed out
  • Error rate increases during peak traffic or batch processing windows

Common Causes

  • proxy_read_timeout or fastcgi_read_timeout set too low for workload
  • Backend application slow due to database queries, external API calls, or processing
  • Backend worker/process pool exhausted, requests queued waiting for available worker
  • Database queries taking too long (missing indexes, locks, large datasets)
  • External API dependencies responding slowly or timing out
  • Backend server under-provisioned (CPU, memory, I/O constraints)
  • Network latency between Nginx and upstream server (cross-region, VPN)
  • Proxy buffering disabled, causing slow client to block upstream connection
  • Backend garbage collection or maintenance pauses
  • Rate limiting or connection limits causing request queuing

Step-by-Step Fix

### 1. Confirm 504 diagnosis

Check Nginx error logs for timeout details:

```bash # Check Nginx error log for upstream timeouts tail -100 /var/log/nginx/error.log | grep -i "timed out"

# Typical 504 error entry: # 2026/03/31 10:15:23 [error] 1234#1234: *56789 upstream timed out (110: Connection timed out) # while reading response header from upstream, # client: 192.168.1.100, server: example.com, # request: "GET /api/reports/monthly HTTP/1.1", # upstream: "http://127.0.0.1:8080/api/reports/monthly", # host: "example.com"

# Check access log for 504 status tail -100 /var/log/nginx/access.log | grep " 504 "

# Enable upstream response time logging to identify slow endpoints # Add to nginx.conf http block: # log_format upstream_time '$remote_addr - $remote_user [$time_local] ' # '"$request" $status $body_bytes_sent ' # '"$http_referer" "$http_user_agent" ' # 'upstream_response_time=$upstream_response_time';

# Query to find slowest upstream responses awk '$9 == 504 {print $0}' /var/log/nginx/access.log | \ grep -o 'upstream_response_time=[0-9.]*' | \ cut -d= -f2 | sort -rn | head -20 ```

Identify which endpoints are timing out:

```bash # Find URIs causing 504 errors awk '$9 == 504 {print $7}' /var/log/nginx/access.log | \ sort | uniq -c | sort -rn | head -20

# Output shows which endpoints need optimization: # 156 /api/reports/monthly # 89 /api/export/csv # 45 /admin/bulk-import ```

### 2. Check current timeout configuration

Identify which timeout is being hit:

```bash # Check Nginx configuration for timeout settings nginx -T 2>/dev/null | grep -E "timeout|buffer"

# Key timeouts to check: # proxy_read_timeout - For HTTP upstreams (proxy_pass) # proxy_send_timeout - For sending request to upstream # proxy_connect_timeout - For initial connection to upstream # fastcgi_read_timeout - For FastCGI (PHP-FPM) # fastcgi_send_timeout - For sending to FastCGI # uwsgi_read_timeout - For uWSGI (Python) # grpc_read_timeout - For gRPC upstreams

# Check site-specific configuration cat /etc/nginx/sites-enabled/example.com.conf | grep -E "timeout|buffer"

# Default timeouts (often too low): # proxy_read_timeout 60s - May be too low for reports/exports # fastcgi_read_timeout 60s - May be too low for complex PHP scripts ```

Timeout hierarchy (most specific wins):

```nginx # http block (global default) http { proxy_read_timeout 60s;

# server block (overrides http) server { proxy_read_timeout 90s;

# location block (most specific, overrides server) location /api/ { proxy_read_timeout 300s; # This wins for /api/ } } } ```

### 3. Increase timeout values appropriately

Set timeouts based on actual workload requirements:

```nginx # /etc/nginx/sites-enabled/example.com.conf

server { listen 80; server_name example.com;

# Default timeout for most endpoints proxy_read_timeout 60s; proxy_send_timeout 60s; proxy_connect_timeout 10s;

# Long-running endpoints need higher timeouts location /api/reports/ { proxy_read_timeout 300s; # 5 minutes for reports proxy_send_timeout 300s; proxy_connect_timeout 30s; }

location /api/export/ { proxy_read_timeout 600s; # 10 minutes for exports proxy_send_timeout 600s; proxy_connect_timeout 30s; }

location /admin/bulk-operations/ { proxy_read_timeout 900s; # 15 minutes for bulk ops proxy_send_timeout 900s; proxy_connect_timeout 30s; }

# WebSocket endpoints (indefinite timeout) location /ws/ { proxy_read_timeout 86400s; # 24 hours proxy_send_timeout 86400s; proxy_connect_timeout 10s;

# WebSocket requires these headers proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; } } ```

For PHP-FPM (FastCGI):

```nginx # /etc/nginx/sites-enabled/example.com.conf

server { listen 80; server_name example.com;

# Default FastCGI timeouts fastcgi_read_timeout 60s; fastcgi_send_timeout 60s;

# Long-running scripts location /admin/reports/ { fastcgi_read_timeout 300s; fastcgi_send_timeout 300s; fastcgi_connect_timeout 30s; }

location ~ \.php$ { include fastcgi_params; fastcgi_pass unix:/run/php/php8.2-fpm.sock; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; } } ```

Also configure PHP-FPM to match:

```ini # /etc/php/8.2/fpm/php.ini

; Must be >= Nginx fastcgi_read_timeout max_execution_time = 300 max_input_time = 300 memory_limit = 512M

; For CLI scripts (if using PHP CLI) ; max_execution_time = 0 ; Unlimited ```

### 4. Enable and tune proxy buffering

Proxy buffering allows Nginx to buffer upstream responses, freeing upstream connections:

```nginx # /etc/nginx/sites-enabled/example.com.conf

location /api/ { proxy_pass http://backend;

# Enable buffering proxy_buffering on;

# Buffer configuration proxy_buffer_size 4k; # Buffer for response header proxy_buffers 8 16k; # 8 buffers of 16k each (128k total) proxy_busy_buffers_size 32k; # Max size to send to client while buffering

# Buffer limits proxy_max_temp_file_size 1024m; # Max temp file size (0 disables temp files) proxy_temp_file_write_size 16k; # Write size to temp file

# Timeouts with buffering proxy_read_timeout 60s; proxy_send_timeout 60s; } ```

How buffering helps:

``` Without buffering: Client (slow) <---> Nginx <---> Upstream - Client downloads slowly, Upstream connection held entire time - Slow client = blocked upstream worker - Can exhaust upstream worker pool

With buffering: Client (slow) <---> Nginx [buffered] <---> Upstream (fast) - Upstream sends response quickly, connection released - Nginx buffers response, serves to slow client at own pace - Upstream workers freed for new requests ```

Buffering for different content types:

```nginx # Buffer API responses (usually small) location /api/ { proxy_buffering on; proxy_buffers 4 8k; proxy_buffer_size 4k; }

# Buffer static content aggressively location /static/ { proxy_buffering on; proxy_buffers 16 32k; proxy_buffer_size 16k; proxy_max_temp_file_size 0; # No temp files }

# Disable buffering for streaming/large files location /downloads/ { proxy_buffering off; proxy_cache off; }

# Selective buffering based on response size # Buffer small responses, passthrough for large location /api/ { proxy_buffering on; proxy_buffers 8 16k;

# If response > buffer size, stream directly proxy_max_temp_file_size 100m; } ```

### 5. Configure upstream keepalive connections

Keepalive connections reduce connection overhead and improve performance:

```nginx # Define upstream with keepalive upstream backend { server 127.0.0.1:8080; server 127.0.0.1:8081;

# Keepalive connections per worker keepalive 32;

# Keepalive timeout (should match upstream server) keepalive_timeout 60s;

# Number of requests per connection keepalive_requests 1000; }

server { location /api/ { proxy_pass http://backend;

# Required for keepalive proxy_http_version 1.1; proxy_set_header Connection "";

# Standard proxy headers proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } } ```

For PHP-FPM:

```nginx upstream php-fpm { server unix:/run/php/php8.2-fpm.sock;

# Keepalive for FastCGI keepalive 16; }

location ~ \.php$ { fastcgi_pass php-fpm;

# Required for keepalive fastcgi_keep_conn on;

include fastcgi_params; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; } ```

### 6. Diagnose backend performance issues

Check if backend is the bottleneck:

```bash # Check backend application logs for slow requests # Example for Node.js tail -100 /var/log/app/error.log | grep -i "slow\|timeout"

# Check PHP-FPM slow log # Enable in /etc/php/8.2/fpm/pool.d/www.conf: # request_slowlog_timeout = 5s # slowlog = /var/log/php-fpm/slow.log

tail -100 /var/log/php-fpm/slow.log

# Typical slow log entry: # [31-Mar-2026 10:15:23] [pool www] pid 1234 # script_filename = /var/www/html/api/reports/monthly.php # [0x00007f1234567890] PDO::query() # [0x00007f1234567900] ReportGenerator::generateMonthlyReport() # [0x00007f1234567a00] require /var/www/html/api/reports/monthly.php

# Check backend resource usage top -p $(pgrep -f "node|python|java|php-fpm")

# Check for processes in uninterruptible sleep (I/O wait) ps aux | awk '$8 ~ /D/ {print}' ```

Database query analysis:

```sql -- Enable slow query log (MySQL) SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 1; -- Log queries > 1 second

-- Check slow queries SELECT * FROM mysql.slow_log;

-- Or check slow log file tail -100 /var/log/mysql/slow.log

-- Find queries without proper indexes EXPLAIN SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';

-- Look for: -- type: ALL (full table scan) - should be ref or range -- rows: High number - needs index -- Extra: Using filesort, Using temporary - needs optimization

-- Check for locks blocking queries SELECT r.trx_id waiting_trx_id, r.trx_mysql_thread_id waiting_thread, r.trx_query waiting_query, b.trx_id blocking_trx_id, b.trx_mysql_thread_id blocking_thread, b.trx_query blocking_query FROM information_schema.innodb_lock_waits w JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id; ```

### 7. Check upstream server capacity

Backend worker exhaustion causes queuing:

```bash # Check PHP-FPM pool status # Enable status page in /etc/php/8.2/fpm/pool.d/www.conf: # pm.status_path = /status

curl http://localhost:9000/status?full

# Key metrics: # pool: www # process manager: dynamic # start time: 31/Mar/2026:10:00:00 +0000 # start since: 54000 # accepts: 123456 # listen queue: 0 # Should be 0, >0 means queuing # max listen queue: 100 # Peak queue size # listen queue len: 128 # Kernel backlog # idle processes: 5 # active processes: 20 # total processes: 25 # max active processes: 30 # max children reached: 5 # >0 means pm.max_children too low

# Tune PHP-FPM pool # /etc/php/8.2/fpm/pool.d/www.conf pm = dynamic pm.max_children = 50 # Increase if max children reached pm.start_servers = 10 pm.min_spare_servers = 5 pm.max_spare_servers = 20 pm.max_requests = 500 # Restart after 500 requests (prevent leaks)

# For high-traffic sites, use static process manager pm = static pm.max_children = 50

# Check Node.js cluster workers # app.js const cluster = require('cluster'); const os = require('os');

if (cluster.isMaster) { const numCPUs = os.cpus().length; for (let i = 0; i < numCPUs * 2; i++) { cluster.fork(); # 2 workers per CPU } }

# Check Gunicorn workers (Python) # gunicorn.conf.py workers = 4 # (2 x $num_cores) + 1 worker_class = 'sync' worker_connections = 1000 # For eventlet/gevent timeout = 30

# Check Tomcat threads # server.xml <Connector port="8080" maxThreads="200" minSpareThreads="25" connectionTimeout="20000" redirectPort="8443" /> ```

### 8. Implement request timeouts at application level

Add application-level timeouts to prevent hanging:

```python # Python Flask with request timeout from flask import Flask, request import signal

app = Flask(__name__)

class TimeoutException(Exception): pass

def timeout_handler(signum, frame): raise TimeoutException("Request timeout")

@app.before_request def before_request(): # Set timeout for all requests signal.signal(signal.SIGALRM, timeout_handler) signal.alarm(30) # 30 second timeout

@app.after_request def after_request(response): # Cancel alarm signal.alarm(0) return response

@app.route('/api/heavy-task') def heavy_task(): # This will timeout after 30 seconds return do_heavy_work() ```

Java Spring Boot:

```java @Configuration public class TimeoutConfig {

@Bean public FilterRegistrationBean<TimeoutFilter> timeoutFilter() { FilterRegistrationBean<TimeoutFilter> registration = new FilterRegistrationBean<>(); TimeoutFilter filter = new TimeoutFilter(30000); // 30 seconds registration.setFilter(filter); registration.addUrlPatterns("/api/*"); return registration; } }

public class TimeoutFilter implements Filter { private final long timeout;

public TimeoutFilter(long timeout) { this.timeout = timeout; }

@Override public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException { ExecutorService executor = Executors.newSingleThreadExecutor(); Future future = executor.submit(() -> chain.doFilter(request, response));

try { future.get(timeout, TimeUnit.MILLISECONDS); } catch (TimeoutException e) { ((HttpServletResponse) response).sendError(504, "Request timeout"); } finally { executor.shutdownNow(); } } } ```

Node.js Express:

```javascript const express = require('express'); const app = express();

// Request timeout middleware app.use((req, res, next) => { req.setTimeout(30000, () => { const err = new Error('Request timeout'); err.status = 504; next(err); }); res.setTimeout(30000, () => { const err = new Error('Response timeout'); err.status = 504; next(err); }); next(); });

// Individual route timeouts app.get('/api/heavy-task', (req, res, next) => { // Override default timeout for this route req.setTimeout(300000); // 5 minutes res.setTimeout(300000);

doHeavyTask((err, result) => { if (err) return next(err); res.json(result); }); }); ```

### 9. Add circuit breaker for external dependencies

Prevent cascading timeouts from slow external services:

```python # Python with circuit breaker from circuitbreaker import circuit

@circuit(failure_threshold=5, recovery_timeout=30) def call_external_api(): response = requests.get('https://api.external.com/data', timeout=5) return response.json()

# After 5 failures, circuit opens # Requests fail immediately for 30 seconds (recovery timeout) # Then half-open state allows test request ```

Java with Resilience4j:

```java import io.github.resilience4j.circuitbreaker.CircuitBreaker; import io.github.resilience4j.timelimiter.TimeLimiter;

@Service public class ExternalService {

private final CircuitBreaker circuitBreaker; private final TimeLimiter timeLimiter;

public ExternalService() { circuitBreaker = CircuitBreaker.ofDefaults("externalApi"); timeLimiter = TimeLimiter.ofDefaults(); }

public CompletableFuture<String> callWithTimeout() { return timeLimiter.executeFutureSupplier( () -> CompletableFuture.supplyAsync(() -> circuitBreaker.executeSupplier(() -> externalApi.call() ) ), Duration.ofSeconds(5) ); } } ```

### 10. Monitor upstream response times

Set up monitoring for early detection:

```nginx # Custom log format with upstream timing # /etc/nginx/nginx.conf

http { log_format upstream_log '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent" ' 'upstream=$upstream_addr ' 'upstream_status=$upstream_http_status ' 'upstream_connect_time=$upstream_connect_time ' 'upstream_header_time=$upstream_header_time ' 'upstream_response_time=$upstream_response_time';

access_log /var/log/nginx/upstream.log upstream_log; } ```

Analyze upstream timing logs:

```bash # Find slow upstream responses cat /var/log/nginx/upstream.log | \ awk -F'upstream_response_time=' '{print $2}' | \ awk '{print $1}' | \ sort -rn | head -20

# Calculate percentile response times cat /var/log/nginx/upstream.log | \ grep -oP 'upstream_response_time=\K[0-9.]+' | \ sort -n | \ awk ' { times[NR] = $1 sum += $1 } END { print "Count:", NR print "Average:", sum/NR print "P50:", times[int(NR*0.50)] print "P95:", times[int(NR*0.95)] print "P99:", times[int(NR*0.99)] print "Max:", times[NR] }' ```

Prometheus metrics with nginx-prometheus-exporter:

```yaml # docker-compose.yml version: '3' services: nginx_exporter: image: nginx/nginx-prometheus-exporter:latest ports: - "9113:9113" command: - -nginx.scrape-uri=http://nginx:80/status

prometheus: image: prom/prometheus:latest volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml ports: - "9090:9090" ```

Prometheus alerting rules:

```yaml # alerting_rules.yml groups: - name: nginx_upstream rules: - alert: NginxUpstreamSlow expr: histogram_quantile(0.95, rate(nginx_upstream_response_time_seconds_bucket[5m])) > 5 for: 5m labels: severity: warning annotations: summary: "Nginx upstream P95 response time above 5 seconds" description: "Upstream {{ $labels.upstream }} P95 response time is {{ $value }}s"

  • alert: NginxUpstreamTimeouts
  • expr: rate(nginx_upstream_response_timeout_total[5m]) > 0.1
  • for: 5m
  • labels:
  • severity: critical
  • annotations:
  • summary: "Nginx upstream timeouts occurring"
  • description: "{{ $value }} timeouts per second on {{ $labels.upstream }}"
  • alert: Nginx504Errors
  • expr: rate(nginx_http_responses_total{status="504"}[5m]) > 0.01
  • for: 5m
  • labels:
  • severity: critical
  • annotations:
  • summary: "Nginx 504 Gateway Timeout errors"
  • description: "{{ $value }} 504 errors per second"
  • `

Prevention

  • Set timeouts based on actual workload requirements, not defaults
  • Enable proxy buffering to decouple client and upstream connections
  • Configure upstream keepalive to reduce connection overhead
  • Monitor upstream response times with P95 and P99 metrics
  • Set up alerts for timeout rate and 504 error rate
  • Implement application-level timeouts for all external calls
  • Use circuit breakers for external service dependencies
  • Load test endpoints to identify timeout requirements
  • Document timeout requirements for each endpoint category
  • Review and tune timeouts after each major deployment
  • **502 Bad Gateway**: Upstream unavailable or returned invalid response
  • **503 Service Unavailable**: No upstream server available
  • **500 Internal Server Error**: Upstream returned error
  • **408 Request Timeout**: Client didn't send request in time
  • **499 Client Closed Request**: Client disconnected before response