Introduction

Sidekiq automatically retries failed jobs with exponential backoff (25 attempts by default). When all retries are exhausted, the job moves to the DeadSet where it waits for manual intervention. Jobs end up dead due to persistent errors (e.g., missing API endpoints, database constraint violations), unhandled exceptions that always fail, or insufficient retry count for transient failures. Understanding Sidekiq's retry mechanism and properly configuring error handling is essential for reliable background processing.

Symptoms

  • Jobs appear in Sidekiq Dead tab with "retry_count" equal to max
  • Failed jobs never complete despite transient errors being resolved
  • Sidekiq::JobRetry::Handled in logs for each retry attempt
  • Dead job count growing over time
  • Important business logic jobs lost to dead queue

Error output: `` Sidekiq::DeadSet: Job JID-abc123 exhausted all 25 retries Error: Net::HTTPFatalError: 500 "Internal Server Error" Failed after 3 hours 22 minutes

Common Causes

  • Persistent error that fails on every retry (not transient)
  • Retry count too low for intermittent failures
  • Job raises unhandled exception on every attempt
  • External API permanently unavailable
  • Database unique constraint violation on retry

Step-by-Step Fix

  1. 1.Configure retry count and error handling per worker:
  2. 2.```ruby
  3. 3.class EmailDeliveryWorker
  4. 4.include Sidekiq::Job
  5. 5.sidekiq_options retry: 10, # Fewer retries for non-critical jobs
  6. 6.queue: :mailers

def perform(user_id, template) user = User.find(user_id) EmailService.send(user, template) rescue Net::SMTPFatalError => e # Permanent error — do not retry Rails.logger.error "Email permanently failed for user #{user_id}: #{e.message}" raise Sidekiq::JobRetry::Handled # Move to dead set immediately rescue Net::SMTPServerBusy => e # Transient error — will be retried Rails.logger.warn "SMTP busy, will retry: #{e.message}" raise # Sidekiq will retry end end

# Disable retries entirely for idempotent jobs that should fail fast class ReportExportWorker include Sidekiq::Job sidekiq_options retry: false

def perform(report_id) ReportExporter.export(report_id) end end ```

  1. 1.**Recover jobs from dead set":
  2. 2.```ruby
  3. 3.# In Rails console or Sidekiq web UI

# Retry ALL dead jobs Sidekiq::DeadSet.new.each(&:retry)

# Retry dead jobs matching a specific pattern Sidekiq::DeadSet.new.each do |job| if job.klass == "EmailDeliveryWorker" job.retry end end

# Retry dead jobs that failed within a time window cutoff = 2.hours.ago.to_f Sidekiq::DeadSet.new.each do |job| job.retry if job.dead_at > cutoff end

# Delete permanently failed dead jobs Sidekiq::DeadSet.new.each do |job| job.delete if job.klass == "OldDeprecatedWorker" end ```

  1. 1.**Add custom retry with Jitter for exponential backoff":
  2. 2.```ruby
  3. 3.class ApiSyncWorker
  4. 4.include Sidekiq::Job
  5. 5.sidekiq_options retry: 15, queue: :external_api

def perform(endpoint, params) response = ExternalApi.call(endpoint, params) process_response(response) rescue Faraday::ConnectionFailed => e raise if Sidekiq::Job.current_retry_count >= 15

# Custom backoff with jitter sleep_time = calculate_backoff(Sidekiq::Job.current_retry_count) sleep(sleep_time) raise end

private

def calculate_backoff(retry_count) base = [30, 2**retry_count].min # Add jitter to prevent thundering herd base + rand(10) end end ```

Prevention

  • Set appropriate retry counts per worker type (critical vs non-critical)
  • Use retry: false for jobs where retries make no sense
  • Rescue permanent errors and raise Sidekiq::JobRetry::Handled to skip retries
  • Monitor dead set size and set up alerts when it grows beyond threshold
  • Use Sidekiq's sidekiq-cron for scheduled jobs instead of re-enqueuing dead jobs
  • Add idempotency keys to prevent duplicate processing on retry
  • Log retry count in job execution for debugging: Sidekiq::Job.current_retry_count