Introduction Read replicas are essential for scaling read-heavy workloads, but replication lag causes users to see stale data immediately after writes. This is one of the most common consistency issues in production systems using primary-replica architectures.

Symptoms - Users report that recently saved data is missing from dashboard views - Order confirmation pages show outdated totals - `SHOW REPLICA STATUS` shows `Seconds_Behind_Master` growing beyond acceptable thresholds - Application reads return NULL for rows that were just inserted on the primary

Common Causes - Heavy write load on primary overwhelms replica's single-threaded SQL apply (MySQL < 8.0) - Long-running transactions on replica blocking replication apply - Network latency between primary and replica across availability zones - Resource contention on replica from analytical queries competing with replication thread - Large bulk INSERT or UPDATE operations generating excessive binary log entries

Step-by-Step Fix 1. **Check replication lag on MySQL**: ```sql SHOW REPLICA STATUS\G -- Look for: Seconds_Behind_Master, Relay_Log_Space, Slave_IO_Running, Slave_SQL_Running ```

  1. 1.Check replication lag on PostgreSQL:
  2. 2.```sql
  3. 3.SELECT
  4. 4.client_addr,
  5. 5.state,
  6. 6.sent_lsn,
  7. 7.write_lsn,
  8. 8.flush_lsn,
  9. 9.replay_lsn,
  10. 10.EXTRACT(EPOCH FROM (now() - replay_lag))::INT AS lag_seconds
  11. 11.FROM pg_stat_replication;
  12. 12.`
  13. 13.Enable multi-threaded replica workers on MySQL 8.0+:
  14. 14.```sql
  15. 15.STOP REPLICA;
  16. 16.SET GLOBAL replica_parallel_workers = 8;
  17. 17.SET GLOBAL replica_parallel_type = 'LOGICAL_CLOCK';
  18. 18.START REPLICA;
  19. 19.`
  20. 20.Implement read-your-writes consistency in application code:
  21. 21.```python
  22. 22.def get_user_profile(user_id):
  23. 23.if request.was_write_recent:
  24. 24.return primary_db.query(User).get(user_id)
  25. 25.return replica_db.query(User).get(user_id)
  26. 26.`
  27. 27.Add lag-based routing logic:
  28. 28.```python
  29. 29.def select_read_target():
  30. 30.lag = check_replica_lag()
  31. 31.if lag > 5:
  32. 32.logger.warning(f"Replica lag {lag}s, routing to primary")
  33. 33.return primary_db
  34. 34.return replica_db
  35. 35.`
  36. 36.Reduce replica load from analytical queries by creating dedicated analytics replicas:
  37. 37.```sql
  38. 38.-- Route heavy analytics queries to a separate replica
  39. 39.CREATE USER analytics_user@'%' IDENTIFIED BY 'secure_password';
  40. 40.GRANT SELECT ON analytics_db.* TO analytics_user@'%';
  41. 41.`

Prevention - Monitor replication lag with alerting thresholds (warn at 5s, critical at 30s) - Implement read-your-writes consistency using session tokens or write timestamps - Use MySQL Group Replication or PostgreSQL synchronous replication for critical data - Set `innodb_flush_log_at_trx_commit = 1` on primary for durability - Use ProxySQL or MaxScale to automatically route reads based on replica health - Avoid running long analytical queries on the same replicas serving application reads