Introduction
Puma's phased restart (SIGUSR2) is designed for zero-downtime deployments by restarting workers one at a time while the master process continues accepting connections. However, old workers can fail to shut down if they have long-running requests, stuck threads, or database connections that do not close. This results in both old and new code running simultaneously, memory growing unbounded, and the new code never fully taking over.
Symptoms
pumactl phased-restarthangs and eventually times outps aux | grep pumashows workers from different app versions- Memory usage grows continuously during phased restart
- New code changes not reflected after phased restart
- Puma logs show
Old worker 1234 did not terminate, sending SIGKILL
Check worker status: ```bash # List all Puma processes ps aux | grep puma # Master process puma 5.6.7 (tcp://0.0.0.0:3000) [myapp]
# Workers from different times # Worker 1 (PID 1234) - started 2 hours ago (OLD) # Worker 2 (PID 5678) - started 2 minutes ago (NEW) ```
Common Causes
- Long-running requests (file uploads, report generation) blocking shutdown
- Thread pool exhaustion preventing worker from finishing active requests
- Database connections not released during worker shutdown hook
- External HTTP calls with no timeout waiting indefinitely
worker_timeoutset too high or disabled
Step-by-Step Fix
- 1.Configure worker timeout and shutdown behavior:
- 2.```ruby
- 3.# config/puma.rb
# Number of seconds to wait for a worker to shut down worker_timeout 60 worker_boot_timeout 30
# Grace period for old workers during phased restart # After this, old workers are force-killed worker_shutdown_timeout 20
# Prune workers that exceed memory limit max_fast = 3 max_fast_window = 60 ```
- 1.Add proper shutdown hooks for cleanup:
- 2.```ruby
- 3.# config/puma.rb
- 4.on_worker_shutdown do
- 5.# Close database connections
- 6.ActiveRecord::Base.connection_pool.disconnect!
# Stop background job processors Sidekiq.drain if defined?(Sidekiq)
# Close Redis connections Rails.cache.redis.close if Rails.cache.respond_to?(:redis)
# Flush any pending log writes Rails.logger.flush if Rails.logger.respond_to?(:flush) end
on_worker_boot do # Reconnect database for new worker ActiveRecord::Base.establish_connection
# Reconnect Redis Rails.cache.reconnect if Rails.cache.respond_to?(:reconnect) end ```
- 1.Use hot_restart instead of phased_restart for full reload:
- 2.```bash
- 3.# phased_restart: restarts workers one at a time (may leave old workers)
- 4.pumactl phased-restart
# hot_restart: restarts all workers immediately (brief connection interruption) pumactl hot-restart
# For deployments where code changed significantly, use hot restart # phased_restart only works when the master process has not changed ```
- 1.Force kill stuck old workers:
- 2.```bash
- 3.# Find old workers
- 4.ps aux | grep "puma: cluster worker"
# Send SIGTERM to specific old worker kill -SIGTERM <old_worker_pid>
# If still running after worker_shutdown_timeout, force kill kill -SIGKILL <old_worker_pid>
# Or use pumactl to check status pumactl -F config/puma.rb stats ```
- 1.**Add deployment script with phased restart fallback":
- 2.```bash
- 3.#!/bin/bash
- 4.# deploy.sh
echo "Deploying new release..." cd /var/www/myapp/current
# Try phased restart first (zero downtime) echo "Attempting phased restart..." if bundle exec pumactl -F config/puma.rb phased-restart 2>/dev/null; then echo "Phased restart successful" else echo "Phased restart failed, falling back to hot restart" bundle exec pumactl -F config/puma.rb hot-restart
# Wait and verify sleep 5 worker_count=$(ps aux | grep "puma: cluster worker" | grep -v grep | wc -l) if [ "$worker_count" -lt 2 ]; then echo "WARNING: Not enough workers running. Starting Puma." bundle exec puma -C config/puma.rb -d fi fi ```
Prevention
- Set
worker_shutdown_timeoutto a reasonable value (15-30 seconds) - Add
on_worker_shutdownhooks to release all resources - Monitor worker memory and PID ages to detect stuck workers
- Use
pumactl statsin health checks to verify worker count - Configure
tagin puma.rb to identify worker app version - Prefer container-based deployments (Docker) with rolling restart over phased restart