Introduction
After a database failover (automatic or manual), applications may continue trying to connect to the old primary database instance that is now down or in read-only mode. The application returns database connection refused errors to users, causing complete site downtime. This is especially common with connection pooling libraries that cache connection details and do not automatically reconnect to the new primary.
Symptoms
- Application returns
500 Internal Server Errorto all requests - Application logs show
Connection refusedorcould not connect to server - Database failover completed successfully but application does not know
- Connection pool shows all connections as
brokenorstale - Direct connection to the new primary database works but the application fails
Common Causes
- Application connection string points to the old primary IP/hostname
- DNS record for database endpoint not updated after failover
- Connection pool holding stale connections to the old primary
- Read replica promoted to primary but application still writing to old endpoint
- Connection retry logic not configured or retry count too low
Step-by-Step Fix
- 1.Verify the new primary database is accepting connections:
- 2.```bash
- 3.# PostgreSQL
- 4.psql -h new-primary-host -U appuser -d appdb -c "SELECT 1"
- 5.# MySQL
- 6.mysql -h new-primary-host -u appuser -p appdb -e "SELECT 1"
- 7.
` - 8.Update the application connection string:
- 9.```bash
- 10.# Update environment variable or config file
- 11.export DATABASE_URL="postgresql://appuser:password@new-primary-host:5432/appdb"
- 12.# Or update the config file
- 13.sudo nano /etc/myapp/config.yml
- 14.
` - 15.Restart the application to clear the connection pool:
- 16.```bash
- 17.sudo systemctl restart myapp
- 18.# Or for containerized apps:
- 19.docker restart myapp-container
- 20.
` - 21.If using a connection pooler (PgBouncer, ProxySQL), restart it:
- 22.```bash
- 23.sudo systemctl restart pgbouncer
- 24.# Verify it is connecting to the new primary
- 25.psql -h localhost -p 6432 -U appuser -d appdb -c "SELECT inet_server_addr()"
- 26.
` - 27.Update DNS if the application connects via hostname:
- 28.```bash
- 29.# Update the database hostname to point to the new primary IP
- 30.# Then flush DNS caches on application servers
- 31.sudo systemd-resolve --flush-caches
- 32.
` - 33.Implement automatic reconnection in the application:
- 34.```python
- 35.# Python SQLAlchemy with automatic retry
- 36.from sqlalchemy import create_engine
- 37.from sqlalchemy.exc import OperationalError
engine = create_engine(DATABASE_URL, pool_pre_ping=True) # pool_pre_ping sends a test query on each checkout to verify connection ```
Prevention
- Use connection poolers (PgBouncer, ProxySQL) that handle failover transparently
- Configure
pool_pre_pingor equivalent to detect stale connections - Use database hostnames (not IPs) in connection strings for easier failover
- Implement automatic connection retry with exponential backoff
- Test database failover procedures regularly in staging environments