Introduction

PM2 cluster mode forks the Node.js process across multiple CPU cores for load balancing. Each forked worker must complete initialization and start listening before PM2 considers it ready. If a worker hangs during startup (database connection, slow initialization, port conflict), PM2 waits indefinitely or restarts the worker in a loop, causing deployment hangs and service unavailability.

Symptoms

  • pm2 list shows workers in launching or errored state
  • pm2 logs shows no output from cluster workers
  • Deployment script hangs waiting for PM2 to report online
  • Workers keep restarting: restart_time increases continuously
  • pm2 monit shows workers consuming CPU but never going online

``` $ pm2 start ecosystem.config.js [PM2] Spawning PM2 daemon with pm2_home=/home/user/.pm2 [PM2] App [app] launched (4 instances) [PM2][WARN] Applications app not running, starting... [PM2] App [app-0] starting in -cluster mode- [PM2] App [app-1] starting in -cluster mode- # Stuck here - workers never go online

$ pm2 list ┌────┬───────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┐ │ id │ name │ namespace │ version │ mode │ pid │ uptime │ ↺ │ status │ ├────┼───────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┤ │ 0 │ app-0 │ default │ 1.0.0 │ cluster │ 0 │ 0 │ 15 │ launching │ ```

Common Causes

  • Application not calling server.listen() properly in cluster mode
  • Database connection blocking startup
  • Port already in use by another process
  • PM2 listen_timeout too short for slow initialization
  • Application using cluster module directly (conflicts with PM2 cluster mode)

Step-by-Step Fix

  1. 1.Ensure proper listen pattern for PM2 cluster mode:
  2. 2.```javascript
  3. 3.// WRONG - PM2 cannot detect when server is ready
  4. 4.const express = require('express');
  5. 5.const app = express();

app.get('/', (req, res) => res.send('OK'));

// Must listen on PORT env variable that PM2 sets const PORT = process.env.PORT || 3000; app.listen(PORT, () => { console.log(Worker ${process.env.NODE_APP_INSTANCE} listening on port ${PORT}); }); ```

  1. 1.Configure PM2 ecosystem file properly:
  2. 2.```javascript
  3. 3.// ecosystem.config.js
  4. 4.module.exports = {
  5. 5.apps: [{
  6. 6.name: 'app',
  7. 7.script: 'server.js',
  8. 8.instances: 'max', // Or specific number: 4
  9. 9.exec_mode: 'cluster',
  10. 10.max_memory_restart: '500M',

// Increase timeouts for slow-starting apps listen_timeout: 30000, // Default: 8000ms kill_timeout: 5000, // Default: 1600ms shutdown_with_message: true,

// Environment env_production: { NODE_ENV: 'production', PORT: 3000, }, env_development: { NODE_ENV: 'development', PORT: 3001, }, }], }; ```

  1. 1.Debug stuck workers:
  2. 2.```bash
  3. 3.# Check worker logs
  4. 4.pm2 logs app --lines 100

# Check specific worker pm2 logs app-0

# Get detailed info pm2 describe app-0

# Monitor in real-time pm2 monit

# Check if port is in use lsof -i :3000

# Try starting in fork mode (not cluster) for debugging pm2 start server.js --name app-debug # If this works, the issue is cluster-specific ```

  1. 1.Handle graceful startup with health checks:
  2. 2.```javascript
  3. 3.const express = require('express');
  4. 4.const app = express();

let isReady = false;

// Simulate slow initialization async function initialize() { console.log('Initializing database connection...'); await db.connect(); console.log('Running migrations...'); await db.migrate(); isReady = true; console.log('Initialization complete'); }

// Health check endpoint app.get('/health', (req, res) => { res.status(isReady ? 200 : 503).json({ status: isReady ? 'ready' : 'initializing', worker: process.env.NODE_APP_INSTANCE, }); });

// Start initialization, then listen initialize().then(() => { const PORT = process.env.PORT || 3000; app.listen(PORT, () => { console.log(Worker ${process.env.NODE_APP_INSTANCE} ready on port ${PORT}); }); }).catch((err) => { console.error('Initialization failed:', err); process.exit(1); }); ```

Prevention

  • Set listen_timeout to match your application's startup time
  • Use shutdown_with_message: true for cleaner cluster shutdowns
  • Add health check endpoints that PM2 can verify
  • Test cluster mode locally with pm2 start --instances 2 before production
  • Use pm2 reload instead of pm2 restart for zero-downtime deployments
  • Monitor worker status in CI/CD: pm2 wait app waits for workers to go online
  • Set max_restarts to prevent infinite restart loops:
  • ```javascript
  • max_restarts: 5,
  • restart_delay: 3000,
  • `