What's Actually Happening
PostgreSQL streaming replication falls behind, with replicas significantly lagging behind the primary. This affects read consistency and failover readiness.
The Error You'll See
Replication lag:
```sql postgres=# SELECT client_addr, state, sent_lsn, replay_lsn, pg_wal_lsn_diff(sent_lsn, replay_lsn) as lag_bytes FROM pg_stat_replication;
client_addr | state | sent_lsn | replay_lsn | lag_bytes -------------+-----------+-----------+------------+----------- 10.0.0.2 | streaming | 0/5000000 | 0/1000000 | 4000000 ```
Replica behind:
```sql -- On replica: postgres=# SELECT pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn());
pg_last_wal_receive_lsn | pg_last_wal_replay_lsn | pg_wal_lsn_diff -------------------------+------------------------+----------------- 0/5000000 | 0/1000000 | 4000000 ```
Query returns stale data:
-- Read from replica shows old data
SELECT * FROM users WHERE id = 1; -- Returns old versionWhy This Happens
- 1.Network bandwidth - Insufficient bandwidth for WAL stream
- 2.High write rate - Primary generates WAL faster than replication
- 3.Replica underpowered - Insufficient CPU/memory for replay
- 4.Disk I/O - Slow disk on replica causes replay delay
- 5.Long transactions - Large transactions create WAL backlog
- 6.Checkpoint spikes - Heavy checkpoint activity
Step 1: Measure Replication Lag
```sql -- On primary, check replication status: SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, pg_wal_lsn_diff(sent_lsn, replay_lsn) as lag_bytes, pg_wal_lsn_diff(sent_lsn, replay_lsn) / 1024 / 1024 as lag_mb FROM pg_stat_replication;
-- Check replication slot lag: SELECT slot_name, active, restart_lsn, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) as lag_bytes FROM pg_replication_slots;
-- On replica, check catch-up status: SELECT pg_last_wal_receive_lsn() as received, pg_last_wal_replay_lsn() as replayed, pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) as lag_bytes, pg_last_xact_replay_timestamp() as last_replay_time;
-- Convert lag to time: SELECT now() - pg_last_xact_replay_timestamp() AS replication_delay; ```
Step 2: Check Network Bandwidth
```bash # Check network bandwidth between primary and replica: iperf3 -c replica-host
# Check WAL transfer rate: # On primary: SELECT pg_wal_lsn_diff(sent_lsn, replay_lsn) FROM pg_stat_replication; # Monitor over time
# Check network interface: sar -n DEV 1 10
# Check for packet loss: mtr replica-host
# Check firewall throughput: iptables -L -n -v | grep 5432
# For high-traffic replication: # Ensure minimum bandwidth = peak write rate * 2 # Example: 100 MB/s writes = 200 MB/s bandwidth needed
# Use separate replication network if possible: # In postgresql.conf: listen_addresses = '10.0.1.1' # Replication network ```
Step 3: Check WAL Generation Rate
```sql -- Check WAL generation rate: SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0') / 1024 / 1024 as total_wal_mb;
-- Monitor over 60 seconds: SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0') / 1024 / 1024 as wal_mb; -- Wait 60 seconds SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0') / 1024 / 1024 as wal_mb;
-- Check WAL directory size: SELECT pg_size_pretty(sum(size)) FROM pg_ls_waldir();
-- Check WAL files: SELECT count(*) FROM pg_ls_waldir();
-- Check wal_keep_size: SHOW wal_keep_size;
-- Monitor WAL sender: SELECT * FROM pg_stat_wal_sender; ```
Step 4: Check Replica Performance
```bash # Check replica CPU usage: top -p $(pgrep postgres)
# Check replica memory: free -h
# Check disk I/O on replica: iostat -x 1 10
# Check disk latency: iostat -x 1 | grep -E "await|svctm"
# Check PostgreSQL processes on replica: ps aux | grep "startup|wal_receiver"
# Check for blocking queries on replica: psql -h replica -c "SELECT * FROM pg_stat_activity WHERE state != 'idle';"
# Check vacuum on replica (can slow replay): psql -h replica -c "SELECT * FROM pg_stat_progress_vacuum;" ```
Step 5: Optimize Primary Configuration
```bash # In postgresql.conf on primary:
# WAL configuration: wal_level = replica max_wal_senders = 10 wal_keep_size = 1GB
# Increase wal_buffers: wal_buffers = 64MB
# Checkpoint tuning: checkpoint_timeout = 10min max_wal_size = 4GB min_wal_size = 1GB checkpoint_completion_target = 0.9
# WAL compression: wal_compression = on
# Synchronous commit (if acceptable): # synchronous_commit = off # Can reduce WAL, but risk data loss
# Increase max_replication_slots: max_replication_slots = 10
# Apply changes: pg_ctl reload ```
Step 6: Optimize Replica Configuration
```bash # In postgresql.conf on replica:
# Hot standby: hot_standby = on hot_standby_feedback = on
# Recovery configuration (postgresql.auto.conf): primary_conninfo = 'host=primary port=5432 user=replicator password=xxx' restore_command = '' # If not using archive recovery recovery_target_timeline = 'latest'
# Max wal receive: max_wal_receive_receivers = 10
# WAL receiver status: wal_receiver_status_interval = 10s
# Fetch interval for WAL: wal_receiver_fetch_interval = 2s
# Restart replica after changes: pg_ctl restart
# Check replica is connected: SELECT * FROM pg_stat_wal_receiver; ```
Step 7: Configure Replication Slots
```sql -- Create replication slot: SELECT pg_create_physical_replication_slot('replica_slot');
-- Check slots: SELECT * FROM pg_replication_slots;
-- Configure replica to use slot: -- In primary_conninfo: primary_conninfo = 'host=primary port=5432 user=replicator slot_name=replica_slot'
-- Drop unused slots: SELECT pg_drop_replication_slot('unused_slot');
-- Check slot lag: SELECT slot_name, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) as lag_bytes FROM pg_replication_slots;
-- Warning: Slots prevent WAL removal! -- Monitor slot lag to avoid disk full ```
Step 8: Use Logical Replication for Specific Tables
```sql -- If only some tables need replication:
-- On primary: CREATE PUBLICATION my_publication FOR TABLE important_table1, important_table2;
-- On replica: CREATE SUBSCRIPTION my_subscription CONNECTION 'host=primary port=5432 user=replicator' PUBLICATION my_publication;
-- Check logical replication: SELECT * FROM pg_stat_subscription;
-- Monitor logical replication lag: SELECT subname, received_lsn, latest_end_lsn, pg_wal_lsn_diff(received_lsn, latest_end_lsn) as lag FROM pg_stat_subscription; ```
Step 9: Handle Long Transactions
```sql -- Check for long transactions: SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state FROM pg_stat_activity WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes';
-- Terminate long transaction: SELECT pg_terminate_backend(pid);
-- Check replication impact: SELECT * FROM pg_stat_replication;
-- Configure statement timeout: SET statement_timeout = '5min';
-- Or in postgresql.conf: statement_timeout = 300000 # 5 minutes in ms ```
Step 10: Monitor Replication Health
```bash # Create monitoring script: cat << 'EOF' > /usr/local/bin/monitor-pg-replication.sh #!/bin/bash
echo "=== Primary Replication Status ===" psql -c "SELECT client_addr, state, pg_wal_lsn_diff(sent_lsn, replay_lsn) / 1024 / 1024 as lag_mb FROM pg_stat_replication;"
echo "" echo "=== Replication Slots ===" psql -c "SELECT slot_name, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) / 1024 / 1024 as lag_mb FROM pg_replication_slots;"
echo "" echo "=== WAL Generation ===" psql -c "SELECT pg_size_pretty(sum(size)) FROM pg_ls_waldir();"
echo "" echo "=== Replica Status ===" psql -h replica -c "SELECT pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), now() - pg_last_xact_replay_timestamp() as delay;" EOF
chmod +x /usr/local/bin/monitor-pg-replication.sh
# Prometheus postgres_exporter metrics: # pg_replication_lag_seconds # pg_replication_is_replica # pg_stat_replication_bytes_diff
# Alert for replication lag: - alert: PostgreSQLReplicationLag expr: pg_replication_lag_seconds > 60 for: 5m labels: severity: warning annotations: summary: "PostgreSQL replication lag > 60 seconds"
- alert: PostgreSQLReplicationDown
- expr: pg_replication_is_replica == 1 and pg_stat_replication_bytes_diff > 100000000
- for: 5m
- labels:
- severity: critical
- annotations:
- summary: "PostgreSQL replica significantly behind"
`
PostgreSQL Replication Lag Checklist
| Check | Command | Expected |
|---|---|---|
| Lag bytes | pg_stat_replication | < 100MB |
| Network | iperf3 | Sufficient bandwidth |
| WAL rate | pg_ls_waldir | Within limits |
| Replica CPU | top | < 80% |
| Disk I/O | iostat | Low latency |
| Slot lag | pg_replication_slots | < wal_keep_size |
Verify the Fix
```sql -- After optimizing replication
-- 1. Check lag reduced SELECT pg_wal_lsn_diff(sent_lsn, replay_lsn) / 1024 / 1024 as lag_mb FROM pg_stat_replication; // < 10MB
-- 2. Verify replica catching up SELECT now() - pg_last_xact_replay_timestamp() as delay; // < 1 second
-- 3. Test failover readiness -- On replica: SELECT pg_is_in_recovery(); // true (ready to promote)
-- 4. Check WAL files SELECT pg_size_pretty(sum(size)) FROM pg_ls_waldir(); // Reasonable size
-- 5. Monitor under load -- Run write workload SELECT * FROM pg_stat_replication; // Lag remains low
-- 6. Check network stable iperf3 -c replica-host // Full bandwidth available ```
Related Issues
- [Fix PostgreSQL WAL Archive Stuck](/articles/fix-postgresql-wal-archive-stuck)
- [Fix Patroni Cluster No Leader](/articles/fix-patroni-cluster-no-leader)
- [Fix MongoDB Replica Set Primary Down](/articles/fix-mongodb-replica-set-primary-down)