Fix Node.js Cluster Worker Disconnect Reconnect Loop

Introduction

Node.js cluster mode creates worker processes to handle incoming connections on shared ports. When a worker crashes and is automatically respawned, but the new worker also crashes for the same reason, a disconnect-reconnect loop occurs. This loop consumes CPU, generates massive log output, and leaves the application with fewer available workers than configured. The root cause is typically an unhandled exception in startup code, a resource that cannot be initialized (database connection, file lock), or a memory leak that causes the worker to exceed its memory limit shortly after starting.

Symptoms

Cluster logs show rapid restart loop:

bash

Worker 12345 disconnected
Worker 12346 started
Worker 12346 disconnected (code: 1, signal: null)
Worker 12347 started
Worker 12347 disconnected (code: 1, signal: null)
Worker 12348 started
... (loop continues)

CPU usage spikes from constant forking:

bash

$ top -p $(pgrep -d',' node)
PID    USER   %CPU  %MEM
12345  app    180   2.1   <-- Multiple workers consuming CPU in crash loop
12346  app    45    0.5
12347  app    35    0.4

Or the master process log:

bash

Worker 42 exited with code 1, signal null
Forking new worker to replace 42
Worker 43 forked
Worker 43 exited with code 1, signal null
Forking new worker to replace 43
... (thousands of restarts per minute)

Common Causes

Unhandled exception in startup code: Worker throws during initialization
Database connection fails: Worker cannot connect to database and crashes
Missing environment variable: Required config not passed to forked workers
Port already in use from previous worker: Old worker did not release the port
Memory limit too low: Worker OOMs immediately after starting
Infinite loop in worker code: Worker consumes all CPU and is killed by health check

Step-by-Step Fix

Step 1: Add restart delay and limit

```javascript const cluster = require('cluster'); const os = require('os');

if (cluster.isPrimary) { const workers = []; let restartCount = 0; const maxRestartsPerMinute = 10; let restartTimestamps = [];

function forkWorker() { const now = Date.now();

// Check restart rate restartTimestamps = restartTimestamps.filter(t => now - t < 60000); if (restartTimestamps.length >= maxRestartsPerMinute) { console.error('Too many worker restarts. Stopping cluster.'); process.exit(1); }

const worker = cluster.fork(); restartTimestamps.push(now);

worker.on('exit', (code, signal) => { console.log(Worker ${worker.process.pid} exited: code=${code}, signal=${signal});

if (code !== 0 && signal === null) { // Worker crashed - wait before restarting const delay = Math.min(1000 * Math.pow(2, restartCount), 30000); console.log(Restarting worker in ${delay}ms (attempt ${restartCount + 1})); restartCount++;

setTimeout(() => { forkWorker(); restartCount = Math.max(0, restartCount - 1); }, delay); } else { // Normal exit or signal - restart immediately forkWorker(); } });

return worker; }

// Fork initial workers const numCPUs = os.cpus().length; for (let i = 0; i < numCPUs; i++) { forkWorker(); } } ```

Step 2: Handle worker startup errors gracefully

```javascript // worker.js async function startWorker() { try { // Initialize database connection await db.connect();

// Load configuration const config = await loadConfig();

// Start server const server = app.listen(config.port, () => { console.log(Worker ${process.pid} listening on port ${config.port});

// Notify master that we are ready if (process.send) { process.send({ type: 'ready', pid: process.pid }); } });

// Handle graceful shutdown process.on('SIGTERM', () => { console.log(Worker ${process.pid} received SIGTERM); server.close(() => { db.disconnect(); process.exit(0); });

// Force exit after timeout setTimeout(() => process.exit(1), 10000); });

} catch (err) { console.error(Worker ${process.pid} failed to start:, err.message);

// Notify master of the failure if (process.send) { process.send({ type: 'error', error: err.message }); }

// Exit with non-zero code to indicate crash process.exit(1); } }

startWorker(); ```

Prevention

Add exponential backoff to worker restarts to prevent rapid crash loops
Set a maximum restart rate and stop the cluster if exceeded
Log the full error stack trace before the worker exits
Use process.send() to communicate worker readiness to the master
Implement graceful shutdown with SIGTERM handling
Add a health check endpoint that verifies database and dependency connectivity
Monitor worker restart rate in production monitoring with alerts

Fix Node.js Cluster Worker Disconnect Reconnect Loop

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Step 1: Add restart delay and limit

Step 2: Handle worker startup errors gracefully

Prevention

Share this guide

More Node.js Troubleshooting Guides

Fix WebSocket Close Code 1006 Unexpected Disconnection

Fix PM2 Cluster Mode Fork Stuck on Startup

Fix npm ci Checksum Mismatch Registry Error

Fix Node.js v8 Serialize Circular Reference Error

Fix Node.js util.promisify Callback First Argument Null Error

Fix Node.js Stream Backpressure Pipe Error