Introduction
PostgreSQL connection pool exhausted occurs when all available database connections are in use, preventing new connections from being established. Applications receive errors like FATAL: sorry, too many clients already, FATAL: remaining connection slots are reserved for non-replication superuser connections, or connection timeout errors. This indicates the connection limit has been reached, causing application failures and service degradation.
Symptoms
- Application logs show
too many clientsorconnection pool exhaustederrors - New database connections hang or timeout waiting for available slot
pg_stat_activityshowsmax_connectionssessions active- Connection wait time increases dramatically during traffic spikes
- Application health checks fail due to database connectivity
- Issue appears during traffic spikes, after deploy with connection leak, or after
max_connectionsreduction
Common Causes
- Application connection leak (connections not returned to pool)
max_connectionsset too low for application concurrency- Long-running queries holding connections for extended periods
- Idle connections not timing out due to missing
idle_in_transaction_session_timeout - Missing connection pooling layer (PgBouncer, Pgpool-II)
- Connection storm after application restart (thundering herd)
- Replication slots consuming reserved connections
Step-by-Step Fix
### 1. Check current connection usage and limits
Query PostgreSQL to understand connection state:
```sql -- Check max_connections setting SHOW max_connections;
-- Check current connection count SELECT count(*) as total_connections, count(*) FILTER (WHERE state = 'active') as active, count(*) FILTER (WHERE state = 'idle') as idle, count(*) FILTER (WHERE state = 'idle in transaction') as idle_in_transaction FROM pg_stat_activity;
-- Check reserved connections (superuser_reserved_connections) SHOW superuser_reserved_connections;
-- Calculate available connections SELECT setting::int - count(*) as available FROM pg_settings, pg_stat_activity WHERE name = 'max_connections'; ```
Expected baseline:
- max_connections: 100-500 for standalone PostgreSQL
- Available connections: > 10% of max
- idle_in_transaction: Should be 0 or very low (indicates connection leak)
### 2. Identify connections by application and user
Find which applications are consuming connections:
```sql -- Group connections by application SELECT application_name, usename as db_user, count(*) as connection_count, count(*) FILTER (WHERE state = 'active') as active_count, count(*) FILTER (WHERE state = 'idle') as idle_count FROM pg_stat_activity WHERE datname = current_database() GROUP BY application_name, usename ORDER BY connection_count DESC;
-- Check connection details including client address SELECT pid, usename, application_name, client_addr, backend_start, state, state_change, query FROM pg_stat_activity WHERE datname = current_database() ORDER BY backend_start; ```
Look for:
- Single application consuming > 50% of connections
- High idle count (connections not being released)
- Old backend_start times (connections held for hours/days)
### 3. Identify connection leaks (idle in transaction)
Connections stuck idle in transaction indicate application connection leaks:
```sql -- Find idle in transaction connections with duration SELECT pid, usename, application_name, client_addr, backend_start, state, now() - state_change as idle_duration, query FROM pg_stat_activity WHERE state = 'idle in transaction' ORDER BY state_change;
-- Find long-running idle in transaction (> 5 minutes) SELECT count(*) as leaking_connections FROM pg_stat_activity WHERE state = 'idle in transaction' AND now() - state_change > interval '5 minutes'; ```
Terminate leaking connections:
```sql -- Terminate specific connection SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND now() - state_change > interval '30 minutes';
-- Terminate all idle in transaction connections (use carefully) SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND now() - state_change > interval '5 minutes' AND application_name = 'leaking-app'; ```
### 4. Configure connection timeout settings
Set timeouts to automatically clean up idle connections:
```sql -- Set idle transaction timeout (milliseconds) ALTER SYSTEM SET idle_in_transaction_session_timeout = 300000; -- 5 minutes
-- Set statement timeout for long-running queries ALTER SYSTEM SET statement_timeout = 300000; -- 5 minutes
-- Set lock timeout ALTER SYSTEM SET lock_timeout = 60000; -- 1 minute
-- Reload configuration SELECT pg_reload_conf(); ```
Verify settings applied:
sql
SHOW idle_in_transaction_session_timeout;
SHOW statement_timeout;
SHOW lock_timeout;
### 5. Deploy PgBouncer for connection pooling
PgBouncer provides efficient connection pooling between application and PostgreSQL:
```ini # /etc/pgbouncer/pgbouncer.ini [databases] mydb = host=localhost port=5432 dbname=mydb
[pgbouncer] listen_port = 6432 listen_addr = 0.0.0.0 auth_type = md5 auth_file = /etc/pgbouncer/userlist.txt
# Connection pool settings pool_mode = transaction # transaction, session, or statement max_client_conn = 1000 # Max client connections default_pool_size = 20 # Connections per user/database min_pool_size = 5 # Minimum idle connections reserve_pool_size = 5 # Extra connections under load reserve_pool_timeout = 5 # Seconds before using reserve
# Timeout settings idle_client_timeout = 600 # Close idle client connections idle_server_timeout = 900 # Close idle server connections server_lifetime = 3600 # Max server connection age server_check_delay = 30 # Health check frequency
# Logging log_connections = 1 log_disconnections = 1 log_pooler_errors = 1 ```
PgBouncer pool modes:
- transaction: Best for most apps, connection released after each transaction
- session: Connection held for entire session (required for some features)
- statement: Connection released after each statement (not for prepared statements)
### 6. Increase max_connections (with caution)
If connections are legitimately needed, increase the limit:
```sql -- Check current setting SHOW max_connections;
-- Check shared_buffers to ensure memory is sufficient SHOW shared_buffers;
-- Update postgresql.conf ALTER SYSTEM SET max_connections = 500;
-- Requires restart SELECT pg_reload_conf(); -- This won't work for max_connections, need full restart ```
Restart PostgreSQL:
```bash # Systemd sudo systemctl restart postgresql
# Or for specific version sudo systemctl restart postgresql-15 ```
Memory considerations:
- Each connection consumes ~2-10 MB RAM
- max_connections * work_mem can exceed RAM
- Prefer PgBouncer over increasing max_connections
### 7. Implement application-side connection pool configuration
Configure connection pool in application code:
```python # SQLAlchemy connection pool from sqlalchemy import create_engine
engine = create_engine( 'postgresql://user:pass@host:port/dbname', pool_size=20, # Number of connections in pool max_overflow=10, # Extra connections beyond pool_size pool_timeout=30, # Seconds to wait for connection pool_recycle=1800, # Recycle connections after 30 minutes pool_pre_ping=True, # Verify connection before use )
# Psycopg2 connection pool from psycopg2 import pool
connection_pool = pool.ThreadedConnectionPool( minconn=5, maxconn=20, host='localhost', database='dbname', user='user', password='pass', connect_timeout=10, ) ```
java
// HikariCP connection pool (Spring Boot default)
# application.yml
spring:
datasource:
hikari:
maximum-pool-size: 20
minimum-idle: 5
idle-timeout: 600000 # 10 minutes
connection-timeout: 30000 # 30 seconds
max-lifetime: 1800000 # 30 minutes
leak-detection-threshold: 60000 # Log leak if connection held > 1 min
### 8. Kill long-running queries consuming connections
Identify and terminate long-running queries:
```sql -- Find long-running queries (> 5 minutes) SELECT pid, usename, application_name, now() - query_start as duration, state, query FROM pg_stat_activity WHERE state = 'active' AND now() - query_start > interval '5 minutes' ORDER BY duration DESC;
-- Terminate long-running query SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'active' AND now() - query_start > interval '30 minutes'; ```
### 9. Check for connection storm after restart
Prevent thundering herd after application restart:
```python # Add jitter to connection pool initialization import random import time
def initialize_connections(): base_delay = 0.1 jitter = random.uniform(0, 0.5) time.sleep(base_delay + jitter) # Create connection ```
yaml
# Kubernetes: add startup probe with delay
startupProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 30
### 10. Monitor connection pool metrics
Set up monitoring for early detection:
```sql -- Create monitoring view CREATE VIEW connection_stats AS SELECT datname, usename, application_name, state, count(*) as connection_count FROM pg_stat_activity GROUP BY datname, usename, application_name, state;
-- Query for monitoring SELECT * FROM connection_stats ORDER BY connection_count DESC; ```
Prometheus metrics (via postgres_exporter):
yaml
# Key metrics to alert on
pg_stat_activity_count{state="idle in transaction"} > 10
pg_settings_max_connections - pg_stat_activity_count < 5
pg_stat_activity_max_tx_duration > 300 # 5 minutes
Prevention
- Deploy PgBouncer in front of PostgreSQL for connection pooling
- Set
idle_in_transaction_session_timeoutto auto-cleanup leaks - Configure application connection pool with appropriate
max_overflow - Enable
pool_pre_pingto detect stale connections - Monitor
idle in transactioncount as leading indicator - Implement connection leak detection in application logs
- Use
statement_timeoutto prevent runaway queries
Related Errors
- **FATAL: too many clients**: Connection limit reached
- **FATAL: remaining connection slots reserved**: Only superuser connections available
- **connection timeout expired**: Application pool timeout waiting for connection