Introduction

Many web applications run database migrations at startup or deployment time. If a migration acquires a lock (to prevent concurrent migrations) and then hangs or fails without releasing the lock, all subsequent application instances will wait indefinitely for the lock, refusing to serve traffic. This results in complete site downtime where every request returns a 503 or hangs until timeout.

Symptoms

  • All application requests return 503 or time out
  • Application logs show:
  • `
  • Waiting for database migration lock...
  • Migration lock held by pid 12345 for 3600 seconds
  • `
  • Or (Rails):
  • `
  • ActiveRecord::ConcurrentMigrationError:
  • Unable to acquire advisory lock on the schema_migrations table
  • `
  • Or (Laravel):
  • `
  • [PDOException] SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded
  • `
  • Database process list shows a long-running migration query

Common Causes

  • Migration running a long ALTER TABLE on a large table
  • Migration deadlocked and stuck waiting
  • Deployment process killed mid-migration leaving lock in place
  • Multiple deployment pods racing to run migrations simultaneously
  • Network interruption during migration causing orphaned lock

Step-by-Step Fix

  1. 1.Identify the stuck migration and lock holder:

For PostgreSQL: ```sql -- Find the migration lock SELECT pid, usename, state, query, wait_event_type, wait_event, now() - query_start as duration FROM pg_stat_activity WHERE query LIKE '%schema_migrations%' OR query LIKE '%ar_internal_metadata%' OR state = 'idle in transaction';

-- Find long-running queries SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state FROM pg_stat_activity WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes' ORDER BY duration DESC; ```

For MySQL: ``sql -- Find blocking queries SHOW FULL PROCESSLIST; SELECT * FROM information_schema.INNODB_TRX; SELECT * FROM information_schema.INNODB_LOCKS; SELECT * FROM information_schema.INNODB_LOCK_WAITS;

  1. 1.Kill the stuck migration process:

PostgreSQL: ```sql -- Cancel the query gracefully first SELECT pg_cancel_backend(12345);

-- If it does not respond, forcefully terminate SELECT pg_terminate_backend(12345);

-- Verify it is gone SELECT pid, query FROM pg_stat_activity WHERE pid = 12345; ```

MySQL: ``sql KILL 12345;

  1. 1.Clear the migration lock:

For Rails (schema_migrations / ar_internal_metadata): ```sql -- Rails advisory lock cleanup -- Check if advisory lock is still held SELECT * FROM pg_locks WHERE locktype = 'advisory';

-- Clear the lock by restarting the connection -- (Locks are released when the holding session terminates) ```

For Laravel (migrations table): ```sql -- Laravel stores batch info, check current state SELECT * FROM migrations ORDER BY batch DESC LIMIT 10;

-- If using a lock mechanism, clear it DELETE FROM migrations WHERE migration = 'the_stuck_migration_name'; ```

For Django (django_migrations): ```sql -- Check applied migrations SELECT * FROM django_migrations ORDER BY applied DESC LIMIT 10;

-- Remove the stuck migration record DELETE FROM django_migrations WHERE app = 'myapp' AND name = '0015_stuck_migration'; ```

  1. 1.Run the migration manually:
  2. 2.```bash
  3. 3.# Rails
  4. 4.RAILS_ENV=production bundle exec rails db:migrate

# Laravel php artisan migrate --force

# Django python manage.py migrate --noinput

# Run with verbose output to see where it hangs RAILS_ENV=production bundle exec rails db:migrate --verbose ```

  1. 1.If a specific migration is too slow, optimize it:
  2. 2.```sql
  3. 3.-- For PostgreSQL, use CONCURRENTLY for index creation
  4. 4.CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

-- For MySQL, use ALGORITHM=INPLACE ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active', ALGORITHM=INPLACE, LOCK=NONE;

-- For large table changes, use pt-online-schema-change pt-online-schema-change \ --alter "ADD COLUMN status VARCHAR(20) DEFAULT 'active'" \ D=mydb,t=users --execute ```

  1. 1.Restart application instances:
  2. 2.```bash
  3. 3.# Once migration is complete, restart app servers
  4. 4.systemctl restart myapp
  5. 5.# Or for Kubernetes:
  6. 6.kubectl rollout restart deployment/myapp
  7. 7.`

Prevention

  • Run migrations as a separate step before deploying application code
  • Use a deployment strategy that separates migration from startup
  • Implement migration timeout: fail fast instead of waiting forever
  • Use advisory locks with a TTL (lease-based locking)
  • For large migrations, use online schema change tools (pt-osc, gh-ost)
  • Monitor migration duration and alert if it exceeds expected time
  • Test migrations on production-size data copies before deploying