Fix Celery Task Retry Exponential Backoff Connection Refused

Introduction

Celery tasks fail with ConnectionRefusedError when the broker (Redis/RabbitMQ) is unreachable, and without proper retry configuration, these failures are permanent. Even with retries enabled, linear retry intervals can overwhelm a recovering broker. The combination of connection refused errors during broker outages and unbounded retries creates thundering herd problems where thousands of tasks retry simultaneously, preventing the broker from recovering. Proper configuration requires exponential backoff with jitter, connection retry settings, and a maximum retry limit that routes permanently failed tasks to a dead letter queue.

Symptoms

bash

[2026-04-09 10:00:00,000: ERROR/ForkPoolWorker-3] Task myapp.tasks.process_order[abc123] raised unexpected: ConnectionRefusedError(61, 'Connection refused')
  File "kombu/connection.py", line 275, in connect
    return self._ensure_connection()

Or retry storm:

bash

[2026-04-09 10:00:01,000: WARNING] Retrying task in 1 second
[2026-04-09 10:00:02,000: WARNING] Retrying task in 1 second
[2026-04-09 10:00:03,000: WARNING] Retrying task in 1 second
# All tasks retrying at the same time - broker cannot recover

Common Causes

Broker not running: Redis or RabbitMQ process crashed or not started
No retry configuration: Tasks fail permanently on connection errors
Linear retry interval: Fixed delay causes thundering herd on recovery
Unlimited retries: Tasks retry forever, clogging the queue
Connection pool exhausted: Celery workers open too many broker connections
Task result backend unreachable: Same connection issue for storing results

Step-by-Step Fix

Step 1: Configure task retry with exponential backoff

```python from celery import Celery from kombu import Queue

app = Celery('myapp', broker='redis://localhost:6379/0')

app.conf.update( task_default_retry_delay=10, # Initial delay: 10 seconds task_default_max_retries=5, # Max 5 retries before giving up task_acks_late=True, # Acknowledge after task completes worker_prefetch_multiplier=1, # One task at a time per worker )

@app.task(bind=True, max_retries=5) def process_order(self, order_id): try: result = send_to_external_api(order_id) return result except ConnectionError as exc: # Exponential backoff: 10s, 20s, 40s, 80s, 160s raise self.retry(exc=exc, countdown=2 ** self.request.retries * 10) except Exception as exc: # Non-retryable errors fail immediately raise self.retry(exc=exc, countdown=60, max_retries=3) ```

Step 2: Configure broker connection resilience

```python app.conf.update( broker_connection_retry=True, broker_connection_retry_on_startup=True, broker_connection_max_retries=10, broker_connection_retry_interval=5, broker_pool_limit=20, # Connection pool size broker_heartbeat=10, # Detect dead connections broker_heartbeat_checkrate=30, )

# For Redis broker specifically app.conf.update( redis_retry_on_timeout=True, redis_socket_keepalive=True, redis_backend_health_check_interval=30, ) ```

Step 3: Route failed tasks to dead letter queue

```python from kombu import Exchange, Queue

app.conf.task_queues = ( Queue('default', Exchange('default'), routing_key='default'), Queue('dead_letter', Exchange('dead_letter'), routing_key='dead_letter'), )

@app.task(bind=True, max_retries=5) def process_order(self, order_id): try: return send_to_external_api(order_id) except Exception as exc: if self.request.retries >= self.max_retries: # Route to dead letter queue self.app.send_task( 'myapp.tasks.handle_dead_letter', args=[order_id, str(exc)], queue='dead_letter', ) return {'status': 'moved_to_dead_letter'} raise self.retry(exc=exc) ```

Prevention

Always set max_retries on tasks that depend on external services
Use exponential backoff with jitter to prevent thundering herd on recovery
Configure broker heartbeat to detect and recover from dead connections
Monitor Celery queue length and retry rates with Flower or Prometheus
Set task_acks_late=True to prevent message loss on worker crash
Implement dead letter queue handling for permanently failed tasks
Test broker failure scenarios in staging by stopping and restarting the broker

Fix Celery Task Retry Exponential Backoff Connection Refused

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Step 1: Configure task retry with exponential backoff

Step 2: Configure broker connection resilience

Step 3: Route failed tasks to dead letter queue

Prevention

Share this guide

More Python Troubleshooting Guides

Fix matplotlib Headless Server Rendering Display Error

Fix Werkzeug Debugger PIN Security Risk in Production

Fix urllib3 InsecureRequestWarning and SSL Warnings

Fix SQLAlchemy QueuePool Connection Exhaustion InvalidRequestError

Fix scikit-learn Model Serialization Pickle Version Mismatch

Fix Python requests MaxRetriesExceededError Connection Pool Exhaustion