Introduction
PM2 cluster mode forks the Node.js process across multiple CPU cores for load balancing. Each forked worker must complete initialization and start listening before PM2 considers it ready. If a worker hangs during startup (database connection, slow initialization, port conflict), PM2 waits indefinitely or restarts the worker in a loop, causing deployment hangs and service unavailability.
Symptoms
pm2 listshows workers inlaunchingorerroredstatepm2 logsshows no output from cluster workers- Deployment script hangs waiting for PM2 to report online
- Workers keep restarting:
restart_timeincreases continuously pm2 monitshows workers consuming CPU but never going online
``` $ pm2 start ecosystem.config.js [PM2] Spawning PM2 daemon with pm2_home=/home/user/.pm2 [PM2] App [app] launched (4 instances) [PM2][WARN] Applications app not running, starting... [PM2] App [app-0] starting in -cluster mode- [PM2] App [app-1] starting in -cluster mode- # Stuck here - workers never go online
$ pm2 list ┌────┬───────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┐ │ id │ name │ namespace │ version │ mode │ pid │ uptime │ ↺ │ status │ ├────┼───────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┤ │ 0 │ app-0 │ default │ 1.0.0 │ cluster │ 0 │ 0 │ 15 │ launching │ ```
Common Causes
- Application not calling
server.listen()properly in cluster mode - Database connection blocking startup
- Port already in use by another process
- PM2
listen_timeouttoo short for slow initialization - Application using
clustermodule directly (conflicts with PM2 cluster mode)
Step-by-Step Fix
- 1.Ensure proper listen pattern for PM2 cluster mode:
- 2.```javascript
- 3.// WRONG - PM2 cannot detect when server is ready
- 4.const express = require('express');
- 5.const app = express();
app.get('/', (req, res) => res.send('OK'));
// Must listen on PORT env variable that PM2 sets
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
console.log(Worker ${process.env.NODE_APP_INSTANCE} listening on port ${PORT});
});
```
- 1.Configure PM2 ecosystem file properly:
- 2.```javascript
- 3.// ecosystem.config.js
- 4.module.exports = {
- 5.apps: [{
- 6.name: 'app',
- 7.script: 'server.js',
- 8.instances: 'max', // Or specific number: 4
- 9.exec_mode: 'cluster',
- 10.max_memory_restart: '500M',
// Increase timeouts for slow-starting apps listen_timeout: 30000, // Default: 8000ms kill_timeout: 5000, // Default: 1600ms shutdown_with_message: true,
// Environment env_production: { NODE_ENV: 'production', PORT: 3000, }, env_development: { NODE_ENV: 'development', PORT: 3001, }, }], }; ```
- 1.Debug stuck workers:
- 2.```bash
- 3.# Check worker logs
- 4.pm2 logs app --lines 100
# Check specific worker pm2 logs app-0
# Get detailed info pm2 describe app-0
# Monitor in real-time pm2 monit
# Check if port is in use lsof -i :3000
# Try starting in fork mode (not cluster) for debugging pm2 start server.js --name app-debug # If this works, the issue is cluster-specific ```
- 1.Handle graceful startup with health checks:
- 2.```javascript
- 3.const express = require('express');
- 4.const app = express();
let isReady = false;
// Simulate slow initialization async function initialize() { console.log('Initializing database connection...'); await db.connect(); console.log('Running migrations...'); await db.migrate(); isReady = true; console.log('Initialization complete'); }
// Health check endpoint app.get('/health', (req, res) => { res.status(isReady ? 200 : 503).json({ status: isReady ? 'ready' : 'initializing', worker: process.env.NODE_APP_INSTANCE, }); });
// Start initialization, then listen
initialize().then(() => {
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
console.log(Worker ${process.env.NODE_APP_INSTANCE} ready on port ${PORT});
});
}).catch((err) => {
console.error('Initialization failed:', err);
process.exit(1);
});
```
Prevention
- Set
listen_timeoutto match your application's startup time - Use
shutdown_with_message: truefor cleaner cluster shutdowns - Add health check endpoints that PM2 can verify
- Test cluster mode locally with
pm2 start --instances 2before production - Use
pm2 reloadinstead ofpm2 restartfor zero-downtime deployments - Monitor worker status in CI/CD:
pm2 wait appwaits for workers to go online - Set
max_restartsto prevent infinite restart loops: - ```javascript
- max_restarts: 5,
- restart_delay: 3000,
`