Introduction

After a database failover (automatic or manual), applications may continue trying to connect to the old primary database instance that is now down or in read-only mode. The application returns database connection refused errors to users, causing complete site downtime. This is especially common with connection pooling libraries that cache connection details and do not automatically reconnect to the new primary.

Symptoms

  • Application returns 500 Internal Server Error to all requests
  • Application logs show Connection refused or could not connect to server
  • Database failover completed successfully but application does not know
  • Connection pool shows all connections as broken or stale
  • Direct connection to the new primary database works but the application fails

Common Causes

  • Application connection string points to the old primary IP/hostname
  • DNS record for database endpoint not updated after failover
  • Connection pool holding stale connections to the old primary
  • Read replica promoted to primary but application still writing to old endpoint
  • Connection retry logic not configured or retry count too low

Step-by-Step Fix

  1. 1.Verify the new primary database is accepting connections:
  2. 2.```bash
  3. 3.# PostgreSQL
  4. 4.psql -h new-primary-host -U appuser -d appdb -c "SELECT 1"
  5. 5.# MySQL
  6. 6.mysql -h new-primary-host -u appuser -p appdb -e "SELECT 1"
  7. 7.`
  8. 8.Update the application connection string:
  9. 9.```bash
  10. 10.# Update environment variable or config file
  11. 11.export DATABASE_URL="postgresql://appuser:password@new-primary-host:5432/appdb"
  12. 12.# Or update the config file
  13. 13.sudo nano /etc/myapp/config.yml
  14. 14.`
  15. 15.Restart the application to clear the connection pool:
  16. 16.```bash
  17. 17.sudo systemctl restart myapp
  18. 18.# Or for containerized apps:
  19. 19.docker restart myapp-container
  20. 20.`
  21. 21.If using a connection pooler (PgBouncer, ProxySQL), restart it:
  22. 22.```bash
  23. 23.sudo systemctl restart pgbouncer
  24. 24.# Verify it is connecting to the new primary
  25. 25.psql -h localhost -p 6432 -U appuser -d appdb -c "SELECT inet_server_addr()"
  26. 26.`
  27. 27.Update DNS if the application connects via hostname:
  28. 28.```bash
  29. 29.# Update the database hostname to point to the new primary IP
  30. 30.# Then flush DNS caches on application servers
  31. 31.sudo systemd-resolve --flush-caches
  32. 32.`
  33. 33.Implement automatic reconnection in the application:
  34. 34.```python
  35. 35.# Python SQLAlchemy with automatic retry
  36. 36.from sqlalchemy import create_engine
  37. 37.from sqlalchemy.exc import OperationalError

engine = create_engine(DATABASE_URL, pool_pre_ping=True) # pool_pre_ping sends a test query on each checkout to verify connection ```

Prevention

  • Use connection poolers (PgBouncer, ProxySQL) that handle failover transparently
  • Configure pool_pre_ping or equivalent to detect stale connections
  • Use database hostnames (not IPs) in connection strings for easier failover
  • Implement automatic connection retry with exponential backoff
  • Test database failover procedures regularly in staging environments