Introduction

PostgreSQL connection pool exhausted occurs when all available database connections are in use, preventing new connections from being established. Applications receive errors like FATAL: sorry, too many clients already, FATAL: remaining connection slots are reserved for non-replication superuser connections, or connection timeout errors. This indicates the connection limit has been reached, causing application failures and service degradation.

Symptoms

  • Application logs show too many clients or connection pool exhausted errors
  • New database connections hang or timeout waiting for available slot
  • pg_stat_activity shows max_connections sessions active
  • Connection wait time increases dramatically during traffic spikes
  • Application health checks fail due to database connectivity
  • Issue appears during traffic spikes, after deploy with connection leak, or after max_connections reduction

Common Causes

  • Application connection leak (connections not returned to pool)
  • max_connections set too low for application concurrency
  • Long-running queries holding connections for extended periods
  • Idle connections not timing out due to missing idle_in_transaction_session_timeout
  • Missing connection pooling layer (PgBouncer, Pgpool-II)
  • Connection storm after application restart (thundering herd)
  • Replication slots consuming reserved connections

Step-by-Step Fix

### 1. Check current connection usage and limits

Query PostgreSQL to understand connection state:

```sql -- Check max_connections setting SHOW max_connections;

-- Check current connection count SELECT count(*) as total_connections, count(*) FILTER (WHERE state = 'active') as active, count(*) FILTER (WHERE state = 'idle') as idle, count(*) FILTER (WHERE state = 'idle in transaction') as idle_in_transaction FROM pg_stat_activity;

-- Check reserved connections (superuser_reserved_connections) SHOW superuser_reserved_connections;

-- Calculate available connections SELECT setting::int - count(*) as available FROM pg_settings, pg_stat_activity WHERE name = 'max_connections'; ```

Expected baseline: - max_connections: 100-500 for standalone PostgreSQL - Available connections: > 10% of max - idle_in_transaction: Should be 0 or very low (indicates connection leak)

### 2. Identify connections by application and user

Find which applications are consuming connections:

```sql -- Group connections by application SELECT application_name, usename as db_user, count(*) as connection_count, count(*) FILTER (WHERE state = 'active') as active_count, count(*) FILTER (WHERE state = 'idle') as idle_count FROM pg_stat_activity WHERE datname = current_database() GROUP BY application_name, usename ORDER BY connection_count DESC;

-- Check connection details including client address SELECT pid, usename, application_name, client_addr, backend_start, state, state_change, query FROM pg_stat_activity WHERE datname = current_database() ORDER BY backend_start; ```

Look for: - Single application consuming > 50% of connections - High idle count (connections not being released) - Old backend_start times (connections held for hours/days)

### 3. Identify connection leaks (idle in transaction)

Connections stuck idle in transaction indicate application connection leaks:

```sql -- Find idle in transaction connections with duration SELECT pid, usename, application_name, client_addr, backend_start, state, now() - state_change as idle_duration, query FROM pg_stat_activity WHERE state = 'idle in transaction' ORDER BY state_change;

-- Find long-running idle in transaction (> 5 minutes) SELECT count(*) as leaking_connections FROM pg_stat_activity WHERE state = 'idle in transaction' AND now() - state_change > interval '5 minutes'; ```

Terminate leaking connections:

```sql -- Terminate specific connection SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND now() - state_change > interval '30 minutes';

-- Terminate all idle in transaction connections (use carefully) SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND now() - state_change > interval '5 minutes' AND application_name = 'leaking-app'; ```

### 4. Configure connection timeout settings

Set timeouts to automatically clean up idle connections:

```sql -- Set idle transaction timeout (milliseconds) ALTER SYSTEM SET idle_in_transaction_session_timeout = 300000; -- 5 minutes

-- Set statement timeout for long-running queries ALTER SYSTEM SET statement_timeout = 300000; -- 5 minutes

-- Set lock timeout ALTER SYSTEM SET lock_timeout = 60000; -- 1 minute

-- Reload configuration SELECT pg_reload_conf(); ```

Verify settings applied:

sql SHOW idle_in_transaction_session_timeout; SHOW statement_timeout; SHOW lock_timeout;

### 5. Deploy PgBouncer for connection pooling

PgBouncer provides efficient connection pooling between application and PostgreSQL:

```ini # /etc/pgbouncer/pgbouncer.ini [databases] mydb = host=localhost port=5432 dbname=mydb

[pgbouncer] listen_port = 6432 listen_addr = 0.0.0.0 auth_type = md5 auth_file = /etc/pgbouncer/userlist.txt

# Connection pool settings pool_mode = transaction # transaction, session, or statement max_client_conn = 1000 # Max client connections default_pool_size = 20 # Connections per user/database min_pool_size = 5 # Minimum idle connections reserve_pool_size = 5 # Extra connections under load reserve_pool_timeout = 5 # Seconds before using reserve

# Timeout settings idle_client_timeout = 600 # Close idle client connections idle_server_timeout = 900 # Close idle server connections server_lifetime = 3600 # Max server connection age server_check_delay = 30 # Health check frequency

# Logging log_connections = 1 log_disconnections = 1 log_pooler_errors = 1 ```

PgBouncer pool modes: - transaction: Best for most apps, connection released after each transaction - session: Connection held for entire session (required for some features) - statement: Connection released after each statement (not for prepared statements)

### 6. Increase max_connections (with caution)

If connections are legitimately needed, increase the limit:

```sql -- Check current setting SHOW max_connections;

-- Check shared_buffers to ensure memory is sufficient SHOW shared_buffers;

-- Update postgresql.conf ALTER SYSTEM SET max_connections = 500;

-- Requires restart SELECT pg_reload_conf(); -- This won't work for max_connections, need full restart ```

Restart PostgreSQL:

```bash # Systemd sudo systemctl restart postgresql

# Or for specific version sudo systemctl restart postgresql-15 ```

Memory considerations: - Each connection consumes ~2-10 MB RAM - max_connections * work_mem can exceed RAM - Prefer PgBouncer over increasing max_connections

### 7. Implement application-side connection pool configuration

Configure connection pool in application code:

```python # SQLAlchemy connection pool from sqlalchemy import create_engine

engine = create_engine( 'postgresql://user:pass@host:port/dbname', pool_size=20, # Number of connections in pool max_overflow=10, # Extra connections beyond pool_size pool_timeout=30, # Seconds to wait for connection pool_recycle=1800, # Recycle connections after 30 minutes pool_pre_ping=True, # Verify connection before use )

# Psycopg2 connection pool from psycopg2 import pool

connection_pool = pool.ThreadedConnectionPool( minconn=5, maxconn=20, host='localhost', database='dbname', user='user', password='pass', connect_timeout=10, ) ```

java // HikariCP connection pool (Spring Boot default) # application.yml spring: datasource: hikari: maximum-pool-size: 20 minimum-idle: 5 idle-timeout: 600000 # 10 minutes connection-timeout: 30000 # 30 seconds max-lifetime: 1800000 # 30 minutes leak-detection-threshold: 60000 # Log leak if connection held > 1 min

### 8. Kill long-running queries consuming connections

Identify and terminate long-running queries:

```sql -- Find long-running queries (> 5 minutes) SELECT pid, usename, application_name, now() - query_start as duration, state, query FROM pg_stat_activity WHERE state = 'active' AND now() - query_start > interval '5 minutes' ORDER BY duration DESC;

-- Terminate long-running query SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'active' AND now() - query_start > interval '30 minutes'; ```

### 9. Check for connection storm after restart

Prevent thundering herd after application restart:

```python # Add jitter to connection pool initialization import random import time

def initialize_connections(): base_delay = 0.1 jitter = random.uniform(0, 0.5) time.sleep(base_delay + jitter) # Create connection ```

yaml # Kubernetes: add startup probe with delay startupProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 10 periodSeconds: 5 failureThreshold: 30

### 10. Monitor connection pool metrics

Set up monitoring for early detection:

```sql -- Create monitoring view CREATE VIEW connection_stats AS SELECT datname, usename, application_name, state, count(*) as connection_count FROM pg_stat_activity GROUP BY datname, usename, application_name, state;

-- Query for monitoring SELECT * FROM connection_stats ORDER BY connection_count DESC; ```

Prometheus metrics (via postgres_exporter):

yaml # Key metrics to alert on pg_stat_activity_count{state="idle in transaction"} > 10 pg_settings_max_connections - pg_stat_activity_count < 5 pg_stat_activity_max_tx_duration > 300 # 5 minutes

Prevention

  • Deploy PgBouncer in front of PostgreSQL for connection pooling
  • Set idle_in_transaction_session_timeout to auto-cleanup leaks
  • Configure application connection pool with appropriate max_overflow
  • Enable pool_pre_ping to detect stale connections
  • Monitor idle in transaction count as leading indicator
  • Implement connection leak detection in application logs
  • Use statement_timeout to prevent runaway queries
  • **FATAL: too many clients**: Connection limit reached
  • **FATAL: remaining connection slots reserved**: Only superuser connections available
  • **connection timeout expired**: Application pool timeout waiting for connection