Introduction Serverless and cloud-hosted databases (Aurora Serverless, Neon, Supabase, Cloud SQL) aggressively terminate idle connections to conserve resources. Applications that hold connections open during periods of inactivity will encounter `connection reset by peer` or `broken pipe` errors when the next query runs.
Symptoms - Intermittent `connection reset by peer` or `EOF` errors on first query after idle period - `FATAL: terminating connection due to administrator command` in PostgreSQL logs - `MySQL server has gone away` (ERROR 2006) on AWS RDS or Aurora - Connection pool reports healthy connections that fail on first use - Errors correlate with periods of low traffic (nights, weekends)
Common Causes - Cloud provider idle timeout (typically 5-15 minutes) is shorter than application connection pool max idle time - Load balancer or proxy (ALB, nginx) dropping idle TCP connections - NAT gateway connection tracking timeout expiring (typically 350 seconds on AWS) - Application not sending TCP keepalive probes on database connections - Connection pool not validating connections before checkout
Step-by-Step Fix 1. **Identify the timeout source by checking provider settings**: ```sql -- PostgreSQL: check idle timeout SHOW idle_in_transaction_session_timeout; SHOW tcp_keepalives_idle; SHOW tcp_keepalives_interval; SHOW tcp_keepalives_count; ```
- 1.Enable TCP keepalive on the database server:
- 2.```sql
- 3.ALTER SYSTEM SET tcp_keepalives_idle = 30;
- 4.ALTER SYSTEM SET tcp_keepalives_interval = 10;
- 5.ALTER SYSTEM SET tcp_keepalives_count = 6;
- 6.SELECT pg_reload_conf();
- 7.
` - 8.Configure connection pool to test connections before checkout:
- 9.```python
- 10.# SQLAlchemy with pre-ping
- 11.from sqlalchemy import create_engine
engine = create_engine( "postgresql+psycopg2://user:pass@host:5432/db", pool_size=10, max_overflow=20, pool_pre_ping=True, # Tests connection before each checkout pool_recycle=180, # Recycle connections every 3 minutes connect_args={ "keepalives": 1, "keepalives_idle": 30, "keepalives_interval": 10, "keepalives_count": 6, } ) ```
- 1.For Node.js/pg, configure keepalive:
- 2.```javascript
- 3.const { Pool } = require('pg');
- 4.const pool = new Pool({
- 5.connectionString: process.env.DATABASE_URL,
- 6.max: 20,
- 7.idleTimeoutMillis: 30000,
- 8.connectionTimeoutMillis: 5000,
- 9.keepAlive: true,
- 10.keepAliveInitialDelayMillis: 10000,
- 11.});
// Test connection before use pool.on('error', (err) => { console.error('Unexpected pool error', err); }); ```
- 1.Implement retry logic at the application level:
- 2.```python
- 3.def execute_with_retry(query, params, max_retries=3):
- 4.for attempt in range(max_retries):
- 5.try:
- 6.with engine.connect() as conn:
- 7.return conn.execute(text(query), params)
- 8.except (ConnectionError, OSError, DatabaseError) as e:
- 9.if attempt == max_retries - 1:
- 10.raise
- 11.engine.pool.dispose() # Clear the pool
- 12.time.sleep(0.5 * (2 ** attempt))
- 13.
`