Introduction
Sidekiq automatically retries failed jobs with exponential backoff (25 attempts by default). When all retries are exhausted, the job moves to the DeadSet where it waits for manual intervention. Jobs end up dead due to persistent errors (e.g., missing API endpoints, database constraint violations), unhandled exceptions that always fail, or insufficient retry count for transient failures. Understanding Sidekiq's retry mechanism and properly configuring error handling is essential for reliable background processing.
Symptoms
- Jobs appear in Sidekiq Dead tab with "retry_count" equal to max
- Failed jobs never complete despite transient errors being resolved
Sidekiq::JobRetry::Handledin logs for each retry attempt- Dead job count growing over time
- Important business logic jobs lost to dead queue
Error output:
``
Sidekiq::DeadSet: Job JID-abc123 exhausted all 25 retries
Error: Net::HTTPFatalError: 500 "Internal Server Error"
Failed after 3 hours 22 minutes
Common Causes
- Persistent error that fails on every retry (not transient)
- Retry count too low for intermittent failures
- Job raises unhandled exception on every attempt
- External API permanently unavailable
- Database unique constraint violation on retry
Step-by-Step Fix
- 1.Configure retry count and error handling per worker:
- 2.```ruby
- 3.class EmailDeliveryWorker
- 4.include Sidekiq::Job
- 5.sidekiq_options retry: 10, # Fewer retries for non-critical jobs
- 6.queue: :mailers
def perform(user_id, template) user = User.find(user_id) EmailService.send(user, template) rescue Net::SMTPFatalError => e # Permanent error — do not retry Rails.logger.error "Email permanently failed for user #{user_id}: #{e.message}" raise Sidekiq::JobRetry::Handled # Move to dead set immediately rescue Net::SMTPServerBusy => e # Transient error — will be retried Rails.logger.warn "SMTP busy, will retry: #{e.message}" raise # Sidekiq will retry end end
# Disable retries entirely for idempotent jobs that should fail fast class ReportExportWorker include Sidekiq::Job sidekiq_options retry: false
def perform(report_id) ReportExporter.export(report_id) end end ```
- 1.**Recover jobs from dead set":
- 2.```ruby
- 3.# In Rails console or Sidekiq web UI
# Retry ALL dead jobs Sidekiq::DeadSet.new.each(&:retry)
# Retry dead jobs matching a specific pattern Sidekiq::DeadSet.new.each do |job| if job.klass == "EmailDeliveryWorker" job.retry end end
# Retry dead jobs that failed within a time window cutoff = 2.hours.ago.to_f Sidekiq::DeadSet.new.each do |job| job.retry if job.dead_at > cutoff end
# Delete permanently failed dead jobs Sidekiq::DeadSet.new.each do |job| job.delete if job.klass == "OldDeprecatedWorker" end ```
- 1.**Add custom retry with Jitter for exponential backoff":
- 2.```ruby
- 3.class ApiSyncWorker
- 4.include Sidekiq::Job
- 5.sidekiq_options retry: 15, queue: :external_api
def perform(endpoint, params) response = ExternalApi.call(endpoint, params) process_response(response) rescue Faraday::ConnectionFailed => e raise if Sidekiq::Job.current_retry_count >= 15
# Custom backoff with jitter sleep_time = calculate_backoff(Sidekiq::Job.current_retry_count) sleep(sleep_time) raise end
private
def calculate_backoff(retry_count) base = [30, 2**retry_count].min # Add jitter to prevent thundering herd base + rand(10) end end ```
Prevention
- Set appropriate retry counts per worker type (critical vs non-critical)
- Use
retry: falsefor jobs where retries make no sense - Rescue permanent errors and raise
Sidekiq::JobRetry::Handledto skip retries - Monitor dead set size and set up alerts when it grows beyond threshold
- Use Sidekiq's
sidekiq-cronfor scheduled jobs instead of re-enqueuing dead jobs - Add idempotency keys to prevent duplicate processing on retry
- Log retry count in job execution for debugging:
Sidekiq::Job.current_retry_count