Introduction
SQLAlchemy connection pool deadlock manifests when all connections in the pool are checked out and no new connections can be created. Unlike a true OS-level deadlock, this is a resource exhaustion scenario where threads or coroutines hold connections indefinitely while waiting for other connections, creating a circular wait condition. In production under load, this error brings the entire application to a halt as every database operation queues up waiting for a connection that will never become available.
Symptoms
The application hangs and eventually throws:
sqlalchemy.exc.TimeoutError: QueuePool limit of size 10 overflow 0 reached, connection timed out, timeout 30.00Or with queue logging enabled:
sqlalchemy.pool.impl.QueuePool INFO: Connection pool exhausted, waiting for available connection (3 threads waiting)
sqlalchemy.pool.impl.QueuePool INFO: Pool size: 10, Overflow: 0, Checked in: 0, Checked out: 10Application monitoring shows database query latency spiking to the pool timeout value (30 seconds by default), with request queuing behind the database layer.
Common Causes
- Connections not returned to pool: Forgetting to call
session.close()or not using context managers for sessions - Long-running transactions holding connections: A single transaction spanning multiple external API calls holds its connection the entire time
- Pool size too small for concurrency: Default
pool_size=5with 20 worker threads means 15 threads will block immediately under load - Connection leaks in error paths: An exception raised after
engine.connect()but beforeconn.close()leaves the connection checked out - Deadlocked database transactions: Two transactions waiting on row locks in opposite order hold their connections while the database resolves the deadlock
- Using NullPool accidentally: The
NullPoolcreates a new connection for every request and closes it immediately, which can exhaust the database server's max connections
Step-by-Step Fix
Step 1: Configure pool sizing correctly for your workload
```python from sqlalchemy import create_engine from sqlalchemy.pool import QueuePool
engine = create_engine( "postgresql+psycopg2://user:pass@localhost/mydb", poolclass=QueuePool, pool_size=20, # Persistent connections in pool max_overflow=10, # Extra connections allowed beyond pool_size pool_timeout=30, # Seconds to wait before raising TimeoutError pool_recycle=1800, # Recycle connections after 30 minutes pool_pre_ping=True, # Verify connection before each use ) ```
The rule of thumb: pool_size should match the number of concurrent database operations, not the number of threads. For web applications, start with pool_size = (worker_count * 2) + 1.
Step 2: Always use session context managers
```python from contextlib import contextmanager from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
@contextmanager def get_session(): session = Session() try: yield session session.commit() except Exception: session.rollback() raise finally: session.close() # Always returns connection to pool
# Usage - connection always returned to pool with get_session() as session: user = session.query(User).filter_by(id=1).first() user.last_login = datetime.utcnow() session.commit() ```
Step 3: Diagnose connection leaks with pool status logging
```python import logging from sqlalchemy import event
logging.basicConfig() logging.getLogger("sqlalchemy.pool").setLevel(logging.INFO)
@event.listens_for(engine, "checkout") def on_checkout(dbapi_conn, connection_rec, connection_proxy): logging.info( "Connection checked out. Pool status: size=%d, overflow=%d, checked_out=%d", engine.pool.size(), engine.pool.overflow(), engine.pool.checkedout(), )
@event.listens_for(engine, "checkin") def on_checkin(dbapi_conn, connection_rec): logging.info( "Connection checked in. Pool status: size=%d, overflow=%d, checked_out=%d", engine.pool.size(), engine.pool.overflow(), engine.pool.checkedout(), ) ```
This reveals which code paths are checking out connections without checking them back in.
Step 4: Break long transactions into smaller units
```python # BAD - holds connection for the entire duration with get_session() as session: user = session.query(User).get(user_id) external_data = call_slow_external_api(user.email) # Connection held here user.profile = external_data session.commit()
# GOOD - release connection during external calls with get_session() as session: user = session.query(User).get(user_id) user_data = {"email": user.email, "name": user.name}
external_data = call_slow_external_api(user_data["email"])
with get_session() as session: user = session.query(User).get(user_id) user.profile = external_data session.commit() ```
Prevention
- Set
pool_pre_ping=Trueto detect stale connections before use - Monitor
pool.checkedout()metric in production alerting - Use
pool_timeout=10(not 30) to fail fast rather than hanging for 30 seconds - Enable SQLAlchemy echo mode in staging to trace connection checkout patterns
- Use
SHOW max_connectionson PostgreSQL to ensure pool_size + max_overflow does not exceed database limits - Set
pool_recycleto less than your database server'swait_timeoutto avoid stale connections