Introduction

An HTTP 503 Service Unavailable error from a reverse proxy (Nginx, HAProxy, Apache) indicates that the proxy cannot get a response from the upstream application server (Gunicorn, uWSGI, Node.js, Tomcat) within the configured timeout period. The application server may be overloaded, stuck in a deadlock, performing a long-running operation, or completely unresponsive. Unlike a 502 Bad Gateway (connection refused), the 503 with upstream timeout means the proxy connected but the backend did not respond in time.

Symptoms

  • Browser shows 503 Service Unavailable
  • Nginx error log shows upstream timed out (110: Connection timed out)
  • HAProxy logs show sC (server timeout in client phase)
  • Application server process is running but not responding to requests
  • Site works for some endpoints but times out on specific slow endpoints

Common Causes

  • Application server worker threads all blocked on slow database queries
  • Deadlock in application code causing requests to hang indefinitely
  • Reverse proxy timeout too short for legitimately slow operations
  • Garbage collection pause in JVM/Go runtime freezing all request processing
  • Application server process count insufficient for current traffic

Step-by-Step Fix

  1. 1.Check the reverse proxy error logs:
  2. 2.```bash
  3. 3.# Nginx
  4. 4.sudo tail -50 /var/log/nginx/error.log | grep "upstream timed out"
  5. 5.# HAProxy
  6. 6.sudo tail -50 /var/log/haproxy.log | grep "SC"
  7. 7.`
  8. 8.Check if the application server is responsive:
  9. 9.```bash
  10. 10.# Direct request to the application server (bypassing the proxy)
  11. 11.curl -v http://localhost:8080/health
  12. 12.# If this also times out, the application server is the problem
  13. 13.`
  14. 14.Check application server worker status:
  15. 15.```bash
  16. 16.# Gunicorn
  17. 17.sudo systemctl status gunicorn
  18. 18.ps aux | grep gunicorn

# Node.js (PM2) pm2 status pm2 monit

# Check for stuck threads ps -T -p $(pgrep -f "gunicorn|node|java") -o pid,tid,stat,wchan ```

  1. 1.Restart the application server:
  2. 2.```bash
  3. 3.sudo systemctl restart gunicorn
  4. 4.# Or for PM2:
  5. 5.pm2 reload all
  6. 6.# For systemd direct service:
  7. 7.sudo systemctl restart myapp
  8. 8.`
  9. 9.Increase the upstream timeout if the application legitimately needs more time:
  10. 10.```nginx
  11. 11.# Nginx
  12. 12.proxy_read_timeout 120s;
  13. 13.proxy_connect_timeout 30s;
  14. 14.proxy_send_timeout 30s;
  15. 15.`
  16. 16.Check for slow database queries causing the timeout:
  17. 17.```bash
  18. 18.# PostgreSQL
  19. 19.sudo -u postgres psql -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC LIMIT 10;"
  20. 20.# MySQL
  21. 21.mysql -e "SHOW FULL PROCESSLIST;" | sort -k6 -n -r | head -10
  22. 22.`

Prevention

  • Implement health check endpoints and monitor application server responsiveness
  • Set proxy timeouts based on the 99th percentile of response times, not the average
  • Configure circuit breakers in the application to fail fast instead of hanging
  • Use connection pooling to prevent database connection exhaustion
  • Monitor upstream response times and alert when approaching proxy timeout limits