Introduction

When a Sidekiq job fails repeatedly, it exhausts its retry budget and moves to the Dead set. By default, Sidekiq performs 25 retries over about 20 days using exponential backoff. If the underlying issue is not fixed, the job is permanently lost after 6 months in the dead set. This is a critical production issue for payment processing, email delivery, and data synchronization jobs.

Symptoms

  • Sidekiq Web UI shows jobs in the "Dead" tab
  • Sidekiq::DeadSet contains jobs that should have been processed
  • No alerts fired when jobs moved to dead set
  • Important business logic (emails, payments) silently failed
  • Dead set grows continuously indicating systemic issues

Example error in logs: `` 2026-04-09T10:15:00.000Z pid=1234 tid=abc123 WARN: {"context":"Job raised exception", "job":{"class":"ProcessPaymentJob","args":[12345],"retry":25,"queue":"default"}, "error_class":"Stripe::InvalidRequestError", "error_message":"No such customer: cus_abc123", "failed_at":1712500000,"retry_count":25}

Common Causes

  • External service returns permanent error (404, invalid data)
  • Retry count too low for transient failure patterns
  • No alerting configured for dead jobs
  • Job arguments reference deleted records
  • Infinite retry loop: bug causes same failure every time

Step-by-Step Fix

  1. 1.Check and recover dead jobs:
  2. 2.```ruby
  3. 3.# In Rails console
  4. 4.dead_set = Sidekiq::DeadSet.new
  5. 5.dead_set.size # Number of dead jobs

# Find specific failed jobs dead_set.select { |job| job.klass == 'ProcessPaymentJob' }.each do |job| puts "Args: #{job.args.inspect}, Error: #{job['error_message']}, Failed: #{Time.at(job['failed_at'])}" end

# Retry all dead jobs of a specific class dead_set.select { |job| job.klass == 'ProcessPaymentJob' }.each(&:retry) ```

  1. 1.Configure per-job retry strategies:
  2. 2.```ruby
  3. 3.class ProcessPaymentJob
  4. 4.include Sidekiq::Job
  5. 5.sidekiq_options queue: :payments, retry: 10, dead: false

# Or use custom retry logic sidekiq_retry_in do |count, exception| case exception when Stripe::RateLimitError 60 * count # Linear backoff for rate limits when Stripe::APIConnectionError 5 * (2 ** count) # Exponential backoff for connection errors when Stripe::InvalidRequestError raise Sidekiq::JobRetry::Skip # Don't retry invalid requests else 15 * (2 ** count) # Default exponential backoff end end end ```

  1. 1.Add alerting for dead jobs:
  2. 2.```ruby
  3. 3.# config/initializers/sidekiq.rb
  4. 4.Sidekiq.configure_server do |config|
  5. 5.config.death_handlers << lambda do |job, ex|
  6. 6.# Send to error tracking service
  7. 7.Sentry.with_scope do |scope|
  8. 8.scope.set_tags(job_class: job['class'], job_id: job['jid'])
  9. 9.scope.set_context('sidekiq_job', { args: job['args'] })
  10. 10.Sentry.capture_exception(ex)
  11. 11.end

# Or send Slack alert for critical job classes if %w[ProcessPaymentJob SendInvoiceJob].include?(job['class']) Slack.notify("#alerts", "CRITICAL: #{job['class']} dead after #{job['retry_count']} retries") end end end ```

  1. 1.Implement dead letter queue monitoring:
  2. 2.```ruby
  3. 3.# Rake task to monitor dead queue
  4. 4.namespace :sidekiq do
  5. 5.desc "Report on dead jobs"
  6. 6.task :dead_report => :environment do
  7. 7.dead_set = Sidekiq::DeadSet.new
  8. 8.puts "Dead jobs: #{dead_set.size}"

dead_set.each do |job| puts " #{job['class']}: #{job['error_message']} (#{Time.at(job['failed_at'])})" end end end ```

Prevention

  • Configure dead: false for jobs that should retry indefinitely (with caution)
  • Set up monitoring and alerting on dead queue size
  • Use sidekiq_retry_in to implement smart retry strategies per exception type
  • Add idempotency keys to jobs so retries are safe
  • Regularly audit dead jobs to identify systemic issues
  • Keep dead job TTL reasonable (default 6 months): Sidekiq::DeadSet.new.kill_old(30.days)