Introduction Schema migrations that require ACCESS EXCLUSIVE locks in PostgreSQL or META DATA LOCK in MySQL can block all concurrent queries to the affected table. When migrations run during production hours or against large tables, they create a cascading failure where queries queue up and eventually time out.
Symptoms - `ALTER TABLE` hangs indefinitely with no output - Application queries against the migrating table time out with `canceling statement due to lock timeout` - PostgreSQL shows `waiting` processes with `locktype: relation` in `pg_stat_activity` - MySQL shows queries in `Waiting for table metadata lock` state - Connection pool exhaustion as queries pile up waiting for the migration to complete
Common Causes - Running `ALTER TABLE` on a large table during business hours - Long-running transactions holding locks on the table before the migration starts - Migration tools not using `LOCK_TIMEOUT` or `statement_timeout` - Adding columns with default values on PostgreSQL < 11 (rewrites entire table) - Creating indexes concurrently without the `CONCURRENTLY` keyword
Step-by-Step Fix 1. **Identify the blocking migration and affected queries**: ```sql -- PostgreSQL: find lock waits SELECT blocked.pid AS blocked_pid, blocked.query AS blocked_query, blocking.pid AS blocking_pid, blocking.query AS blocking_query, blocked.wait_event_type, blocked.state FROM pg_stat_activity blocked JOIN pg_stat_activity blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid)) WHERE blocked.wait_event_type = 'Lock'; ```
- 1.Terminate blocking idle transactions:
- 2.```sql
- 3.-- Find idle transactions holding locks
- 4.SELECT pid, now() - xact_start AS duration, query, state
- 5.FROM pg_stat_activity
- 6.WHERE state = 'idle in transaction'
- 7.ORDER BY xact_start;
-- Terminate if safe SELECT pg_terminate_backend(12345); ```
- 1.Cancel the stuck migration safely:
- 2.```sql
- 3.-- Cancel the specific migration query
- 4.SELECT pg_cancel_backend(<migration_pid>);
-- If that doesn't work, terminate SELECT pg_terminate_backend(<migration_pid>); ```
- 1.Re-run the migration using zero-downtime techniques:
- 2.```sql
- 3.-- PostgreSQL: Add column without rewriting table (PG 11+)
- 4.ALTER TABLE users ADD COLUMN last_login_at TIMESTAMPTZ DEFAULT NOW();
-- PostgreSQL: Create index without blocking reads/writes CREATE INDEX CONCURRENTLY idx_users_email ON users (email);
-- Add column in phases for older PostgreSQL versions: -- Phase 1: Add nullable column ALTER TABLE users ADD COLUMN last_login_at TIMESTAMPTZ; -- Phase 2: Deploy app code that writes to the new column -- Phase 3: Backfill existing rows in batches UPDATE users SET last_login_at = NOW() WHERE last_login_at IS NULL AND id BETWEEN 1 AND 100000; -- Phase 4: Add NOT NULL constraint ALTER TABLE users ALTER COLUMN last_login_at SET NOT NULL; ```
- 1.Set timeouts to prevent indefinite blocking:
- 2.```sql
- 3.SET statement_timeout = '30s';
- 4.SET lock_timeout = '10s';
- 5.ALTER TABLE users ADD COLUMN status VARCHAR(20);
- 6.
`