Fix Gunicorn Worker TIMEOUT Signal - Production Tuning Guide

Introduction

Gunicorn workers are killed with a SIGABRT signal when they fail to respond to the master process's heartbeat within the configured timeout period (default 30 seconds). This typically happens when a worker is stuck processing a slow request -- a long-running database query, an external API call with no timeout, or a CPU-bound operation blocking the worker. In production, worker timeouts cause intermittent 502 Bad Gateway errors for users, reduced throughput as workers are recycled, and potential cascading failures when the replacement workers inherit the same slow workload.

Symptoms

In Gunicorn error logs:

bash

[2024-03-15 14:23:01 +0000] [12] [CRITICAL] WORKER TIMEOUT (pid:456)
[2024-03-15 14:23:01 +0000] [456] [INFO] Worker exiting (pid: 456)
[2024-03-15 14:23:02 +0000] [12] [INFO] Booting worker with pid: 789

Nginx access logs show intermittent 502 errors:

bash

10.0.1.50 - - [15/Mar/2024:14:23:01 +0000] "POST /api/reports/generate HTTP/1.1" 502 166 "-" "python-requests/2.28.0"

Gunicorn access log shows the slow request:

bash

10.0.1.50 - - [15/Mar/2024:14:22:35 +0000] "POST /api/reports/generate HTTP/1.1" 200 45231 "-" "python-requests/2.28.0" 28.543

The request took 28.5 seconds -- dangerously close to the 30-second timeout.

Common Causes

External API call without timeout: requests.get() without timeout parameter hangs indefinitely
Slow database query: Unindexed query or large result set processing
CPU-bound operation in sync worker: Data processing, image resizing, or CSV generation blocking the worker
Deadlock in application code: Database lock or threading deadlock prevents worker from responding to heartbeat
Timeout value too low: Default 30-second timeout insufficient for legitimate long-running requests
Worker class mismatch: Using sync workers for I/O-bound workloads that should use gevent workers

Step-by-Step Fix

Step 1: Identify the slow request path

Enable request timing to find the problematic endpoint:

bash

# Start gunicorn with access log format including response time
gunicorn myapp:app \
    --access-logfile - \
    --access-logformat '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(D)s'

The %D field shows request time in microseconds. Filter for requests over 20 seconds:

bash

grep -E '" [0-9]+ [0-9]+ "[^"]*" "[^"]*" [2-9][0-9]{7}' access.log

Step 2: Increase timeout for legitimately slow endpoints

If the endpoint genuinely needs more than 30 seconds:

bash

gunicorn myapp:app \
    --workers 4 \
    --timeout 120 \
    --graceful-timeout 30 \
    --keep-alive 5

The --graceful-timeout 30 allows workers in the process of shutting down to finish their current request within 30 seconds. --keep-alive 5 reduces connection overhead.

Step 3: Add timeouts to all external calls

```python import requests

# WRONG - no timeout, worker hangs forever response = requests.get("https://api.external.com/data")

# CORRECT - timeout ensures the worker is freed response = requests.get( "https://api.external.com/data", timeout=(3.05, 25), # 3s connect, 25s read ) ```

For database queries, set statement timeout at the connection level:

```python from sqlalchemy import event, text

@event.listens_for(engine, "connect") def set_statement_timeout(dbapi_connection, connection_record): cursor = dbapi_connection.cursor() cursor.execute("SET statement_timeout = '25000'") # 25 seconds cursor.close() ```

Step 4: Offload long-running work to background tasks

```python from celery import Celery

@app.task def generate_report(report_id, params): # This runs in a Celery worker, not a Gunicorn worker report = build_report(params) save_report(report_id, report) return report_id

# Flask endpoint - returns immediately @app.route("/api/reports/generate", methods=["POST"]) def start_report_generation(): task = generate_report.delay(request.json["report_id"], request.json["params"]) return {"task_id": task.id, "status_url": f"/api/reports/status/{task.id}"}, 202 ```

Step 5: Use async workers for I/O-bound workloads

bash

# For I/O-heavy applications with many concurrent connections
gunicorn myapp:app \
    --worker-class gevent \
    --workers 4 \
    --timeout 120 \
    --worker-connections 1000

Gevent workers use greenlets to handle many concurrent connections in fewer OS threads.

Prevention

Set --timeout to slightly more than your slowest expected request
Add application-level request timeout middleware that logs warnings at 80% of the Gunicorn timeout
Use APM tools like New Relic or Datadog to track p95 and p99 request latencies
Configure alerting on Gunicorn worker restart rate
Use --max-requests 1000 to recycle workers periodically and prevent memory leaks
Always set explicit timeouts on external service calls -- never rely on OS-level TCP timeout defaults

Fix Gunicorn Worker Timeout Signal in Production

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Step 1: Identify the slow request path

Step 2: Increase timeout for legitimately slow endpoints

Step 3: Add timeouts to all external calls

Step 4: Offload long-running work to background tasks

Step 5: Use async workers for I/O-bound workloads

Prevention

Share this guide

More Python Troubleshooting Guides

Python Unit Test Error

Python Argparse Error

Python Logging Configuration Error

Python URLLIB Error

Python Requests Timeout Error

Python FastAPI Validation Error