What's Actually Happening

PostgreSQL streaming replication falls behind, with replicas significantly lagging behind the primary. This affects read consistency and failover readiness.

The Error You'll See

Replication lag:

```sql postgres=# SELECT client_addr, state, sent_lsn, replay_lsn, pg_wal_lsn_diff(sent_lsn, replay_lsn) as lag_bytes FROM pg_stat_replication;

client_addr | state | sent_lsn | replay_lsn | lag_bytes -------------+-----------+-----------+------------+----------- 10.0.0.2 | streaming | 0/5000000 | 0/1000000 | 4000000 ```

Replica behind:

```sql -- On replica: postgres=# SELECT pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn());

pg_last_wal_receive_lsn | pg_last_wal_replay_lsn | pg_wal_lsn_diff -------------------------+------------------------+----------------- 0/5000000 | 0/1000000 | 4000000 ```

Query returns stale data:

sql
-- Read from replica shows old data
SELECT * FROM users WHERE id = 1;  -- Returns old version

Why This Happens

  1. 1.Network bandwidth - Insufficient bandwidth for WAL stream
  2. 2.High write rate - Primary generates WAL faster than replication
  3. 3.Replica underpowered - Insufficient CPU/memory for replay
  4. 4.Disk I/O - Slow disk on replica causes replay delay
  5. 5.Long transactions - Large transactions create WAL backlog
  6. 6.Checkpoint spikes - Heavy checkpoint activity

Step 1: Measure Replication Lag

```sql -- On primary, check replication status: SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, pg_wal_lsn_diff(sent_lsn, replay_lsn) as lag_bytes, pg_wal_lsn_diff(sent_lsn, replay_lsn) / 1024 / 1024 as lag_mb FROM pg_stat_replication;

-- Check replication slot lag: SELECT slot_name, active, restart_lsn, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) as lag_bytes FROM pg_replication_slots;

-- On replica, check catch-up status: SELECT pg_last_wal_receive_lsn() as received, pg_last_wal_replay_lsn() as replayed, pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) as lag_bytes, pg_last_xact_replay_timestamp() as last_replay_time;

-- Convert lag to time: SELECT now() - pg_last_xact_replay_timestamp() AS replication_delay; ```

Step 2: Check Network Bandwidth

```bash # Check network bandwidth between primary and replica: iperf3 -c replica-host

# Check WAL transfer rate: # On primary: SELECT pg_wal_lsn_diff(sent_lsn, replay_lsn) FROM pg_stat_replication; # Monitor over time

# Check network interface: sar -n DEV 1 10

# Check for packet loss: mtr replica-host

# Check firewall throughput: iptables -L -n -v | grep 5432

# For high-traffic replication: # Ensure minimum bandwidth = peak write rate * 2 # Example: 100 MB/s writes = 200 MB/s bandwidth needed

# Use separate replication network if possible: # In postgresql.conf: listen_addresses = '10.0.1.1' # Replication network ```

Step 3: Check WAL Generation Rate

```sql -- Check WAL generation rate: SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0') / 1024 / 1024 as total_wal_mb;

-- Monitor over 60 seconds: SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0') / 1024 / 1024 as wal_mb; -- Wait 60 seconds SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0') / 1024 / 1024 as wal_mb;

-- Check WAL directory size: SELECT pg_size_pretty(sum(size)) FROM pg_ls_waldir();

-- Check WAL files: SELECT count(*) FROM pg_ls_waldir();

-- Check wal_keep_size: SHOW wal_keep_size;

-- Monitor WAL sender: SELECT * FROM pg_stat_wal_sender; ```

Step 4: Check Replica Performance

```bash # Check replica CPU usage: top -p $(pgrep postgres)

# Check replica memory: free -h

# Check disk I/O on replica: iostat -x 1 10

# Check disk latency: iostat -x 1 | grep -E "await|svctm"

# Check PostgreSQL processes on replica: ps aux | grep "startup|wal_receiver"

# Check for blocking queries on replica: psql -h replica -c "SELECT * FROM pg_stat_activity WHERE state != 'idle';"

# Check vacuum on replica (can slow replay): psql -h replica -c "SELECT * FROM pg_stat_progress_vacuum;" ```

Step 5: Optimize Primary Configuration

```bash # In postgresql.conf on primary:

# WAL configuration: wal_level = replica max_wal_senders = 10 wal_keep_size = 1GB

# Increase wal_buffers: wal_buffers = 64MB

# Checkpoint tuning: checkpoint_timeout = 10min max_wal_size = 4GB min_wal_size = 1GB checkpoint_completion_target = 0.9

# WAL compression: wal_compression = on

# Synchronous commit (if acceptable): # synchronous_commit = off # Can reduce WAL, but risk data loss

# Increase max_replication_slots: max_replication_slots = 10

# Apply changes: pg_ctl reload ```

Step 6: Optimize Replica Configuration

```bash # In postgresql.conf on replica:

# Hot standby: hot_standby = on hot_standby_feedback = on

# Recovery configuration (postgresql.auto.conf): primary_conninfo = 'host=primary port=5432 user=replicator password=xxx' restore_command = '' # If not using archive recovery recovery_target_timeline = 'latest'

# Max wal receive: max_wal_receive_receivers = 10

# WAL receiver status: wal_receiver_status_interval = 10s

# Fetch interval for WAL: wal_receiver_fetch_interval = 2s

# Restart replica after changes: pg_ctl restart

# Check replica is connected: SELECT * FROM pg_stat_wal_receiver; ```

Step 7: Configure Replication Slots

```sql -- Create replication slot: SELECT pg_create_physical_replication_slot('replica_slot');

-- Check slots: SELECT * FROM pg_replication_slots;

-- Configure replica to use slot: -- In primary_conninfo: primary_conninfo = 'host=primary port=5432 user=replicator slot_name=replica_slot'

-- Drop unused slots: SELECT pg_drop_replication_slot('unused_slot');

-- Check slot lag: SELECT slot_name, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) as lag_bytes FROM pg_replication_slots;

-- Warning: Slots prevent WAL removal! -- Monitor slot lag to avoid disk full ```

Step 8: Use Logical Replication for Specific Tables

```sql -- If only some tables need replication:

-- On primary: CREATE PUBLICATION my_publication FOR TABLE important_table1, important_table2;

-- On replica: CREATE SUBSCRIPTION my_subscription CONNECTION 'host=primary port=5432 user=replicator' PUBLICATION my_publication;

-- Check logical replication: SELECT * FROM pg_stat_subscription;

-- Monitor logical replication lag: SELECT subname, received_lsn, latest_end_lsn, pg_wal_lsn_diff(received_lsn, latest_end_lsn) as lag FROM pg_stat_subscription; ```

Step 9: Handle Long Transactions

```sql -- Check for long transactions: SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state FROM pg_stat_activity WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes';

-- Terminate long transaction: SELECT pg_terminate_backend(pid);

-- Check replication impact: SELECT * FROM pg_stat_replication;

-- Configure statement timeout: SET statement_timeout = '5min';

-- Or in postgresql.conf: statement_timeout = 300000 # 5 minutes in ms ```

Step 10: Monitor Replication Health

```bash # Create monitoring script: cat << 'EOF' > /usr/local/bin/monitor-pg-replication.sh #!/bin/bash

echo "=== Primary Replication Status ===" psql -c "SELECT client_addr, state, pg_wal_lsn_diff(sent_lsn, replay_lsn) / 1024 / 1024 as lag_mb FROM pg_stat_replication;"

echo "" echo "=== Replication Slots ===" psql -c "SELECT slot_name, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) / 1024 / 1024 as lag_mb FROM pg_replication_slots;"

echo "" echo "=== WAL Generation ===" psql -c "SELECT pg_size_pretty(sum(size)) FROM pg_ls_waldir();"

echo "" echo "=== Replica Status ===" psql -h replica -c "SELECT pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), now() - pg_last_xact_replay_timestamp() as delay;" EOF

chmod +x /usr/local/bin/monitor-pg-replication.sh

# Prometheus postgres_exporter metrics: # pg_replication_lag_seconds # pg_replication_is_replica # pg_stat_replication_bytes_diff

# Alert for replication lag: - alert: PostgreSQLReplicationLag expr: pg_replication_lag_seconds > 60 for: 5m labels: severity: warning annotations: summary: "PostgreSQL replication lag > 60 seconds"

  • alert: PostgreSQLReplicationDown
  • expr: pg_replication_is_replica == 1 and pg_stat_replication_bytes_diff > 100000000
  • for: 5m
  • labels:
  • severity: critical
  • annotations:
  • summary: "PostgreSQL replica significantly behind"
  • `

PostgreSQL Replication Lag Checklist

CheckCommandExpected
Lag bytespg_stat_replication< 100MB
Networkiperf3Sufficient bandwidth
WAL ratepg_ls_waldirWithin limits
Replica CPUtop< 80%
Disk I/OiostatLow latency
Slot lagpg_replication_slots< wal_keep_size

Verify the Fix

```sql -- After optimizing replication

-- 1. Check lag reduced SELECT pg_wal_lsn_diff(sent_lsn, replay_lsn) / 1024 / 1024 as lag_mb FROM pg_stat_replication; // < 10MB

-- 2. Verify replica catching up SELECT now() - pg_last_xact_replay_timestamp() as delay; // < 1 second

-- 3. Test failover readiness -- On replica: SELECT pg_is_in_recovery(); // true (ready to promote)

-- 4. Check WAL files SELECT pg_size_pretty(sum(size)) FROM pg_ls_waldir(); // Reasonable size

-- 5. Monitor under load -- Run write workload SELECT * FROM pg_stat_replication; // Lag remains low

-- 6. Check network stable iperf3 -c replica-host // Full bandwidth available ```

  • [Fix PostgreSQL WAL Archive Stuck](/articles/fix-postgresql-wal-archive-stuck)
  • [Fix Patroni Cluster No Leader](/articles/fix-patroni-cluster-no-leader)
  • [Fix MongoDB Replica Set Primary Down](/articles/fix-mongodb-replica-set-primary-down)