Introduction

Node.js HTTP Agent with keepAlive: true reuses TCP connections across multiple HTTP requests, significantly reducing latency by avoiding the TCP handshake overhead. However, keep-alive sockets can become stale -- the server closes its end of the connection (due to idle timeout or restart), but the Node.js agent still considers the socket alive and tries to reuse it. This causes ECONNRESET, socket hang up, or read ECONNRESET errors on otherwise healthy requests. The problem is intermittent and hard to reproduce, making it one of the most frustrating networking issues in production Node.js services.

Symptoms

Intermittent connection errors:

bash
Error: read ECONNRESET
    at TCP.onStreamRead (node:internal/stream_base_commons:211:20)

Or:

bash
Error: socket hang up
    at connResetException (node:internal/errors:691:14)
    at Socket.socketCloseListener (node:_http_client:426:25)

The error happens on approximately 1-5% of requests after the server has been running for a while.

Common Causes

  • Server closes idle connections: Server-side timeout (e.g., Nginx keepalive_timeout 65s) closes sockets
  • Agent does not detect closed sockets: Node.js agent only discovers the socket is dead when trying to use it
  • maxSockets too low: All sockets in the pool are busy, requests queue up
  • Server restart without draining: Server restarts, all existing keep-alive sockets become invalid
  • Keep-alive timeout mismatch: Client keepAliveMsecs differs from server keepalive_timeout
  • Not handling socket error on reused connection: Error propagates as unhandled rejection

Step-by-Step Fix

Step 1: Configure the HTTP agent with proper settings

```javascript const http = require('http'); const https = require('https');

const agentConfig = { keepAlive: true, keepAliveMsecs: 30000, // Send keep-alive probe every 30 seconds maxSockets: 50, // Max sockets per origin maxFreeSockets: 10, // Max idle sockets to keep in pool timeout: 5000, // Socket timeout scheduling: 'lifo', // Use last-in-first-out for better socket reuse };

const httpAgent = new http.Agent(agentConfig); const httpsAgent = new https.Agent({ ...agentConfig, rejectUnauthorized: true, });

// Use the agent for requests const https = require('https');

function makeRequest(url) { return new Promise((resolve, reject) => { const req = https.get(url, { agent: httpsAgent }, (res) => { let data = ''; res.on('data', (chunk) => { data += chunk; }); res.on('end', () => resolve(data)); });

req.on('error', (err) => { // Retry on socket errors if (err.code === 'ECONNRESET' || err.code === 'ECONNREFUSED') { // The agent will automatically remove the dead socket // and create a new one on retry resolve(makeRequest(url)); // Simple retry } else { reject(err); } });

req.setTimeout(30000, () => { req.destroy(); reject(new Error('Request timeout')); }); }); } ```

Step 2: Use a retry wrapper for resilient requests

```javascript async function requestWithRetry(url, options = {}, maxRetries = 2) { const https = require('https');

for (let attempt = 1; attempt <= maxRetries + 1; attempt++) { try { return await new Promise((resolve, reject) => { const req = https.get(url, { agent: httpsAgent, ...options, }, (res) => { let data = ''; res.on('data', (chunk) => { data += chunk; }); res.on('end', () => resolve({ status: res.statusCode, data })); });

req.on('error', reject); }); } catch (err) { if ( (err.code === 'ECONNRESET' || err.code === 'ECONNREFUSED' || err.message.includes('socket hang up')) && attempt <= maxRetries ) { console.log(Request failed (attempt ${attempt}), retrying...); // Wait briefly before retry to let the agent clean up await new Promise(resolve => setTimeout(resolve, 100 * attempt)); continue; } throw err; } } } ```

Step 3: Monitor agent socket pool

```javascript function logAgentStats() { console.log('HTTPS Agent stats:', { requests: httpsAgent.requests, // Queued requests sockets: httpsAgent.sockets, // Active sockets freeSockets: httpsAgent.freeSockets, // Idle sockets }); }

setInterval(logAgentStats, 60000); ```

Prevention

  • Always set keepAlive: true with appropriate keepAliveMsecs and maxSockets
  • Implement retry logic for ECONNRESET and socket hang up errors
  • Set maxFreeSockets to limit idle socket count
  • Use scheduling: 'lifo' to reuse the most recently freed socket (warmest)
  • Monitor agent socket pool stats in production
  • Set server-side keepalive_timeout to be shorter than client keepAliveMsecs
  • Use fetch() with a connection pool library like undici for better connection management