Introduction PostgreSQL logical replication slots retain WAL files until the subscriber acknowledges receipt. If the subscriber disconnects or falls behind, the slot holds WAL indefinitely, eventually consuming all disk space. Unlike physical replication slots, logical slots also retain catalog information, making them more resource-intensive.
Symptoms - `pg_replication_slots` shows `restart_lsn` far behind `confirmed_flush_lsn` - Disk usage on WAL partition growing continuously - `pg_stat_replication` shows no active logical replication connections - Subscriber logs show `could not receive data from WAL stream` errors - Primary disk reaches 100% due to retained WAL files
Common Causes - Subscriber node offline for maintenance longer than WAL retention - Network issue breaking the replication connection - Subscriber unable to apply changes due to constraint violations or conflicts - `max_slot_wal_keep_size` not set, allowing unlimited WAL retention - Large DDL changes on the publisher causing massive replay on subscriber
Step-by-Step Fix 1. **Check replication slot status": ```sql SELECT slot_name, plugin, slot_type, active, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots; ```
- 1.**Check the WAL disk usage":
- 2.```sql
- 3.SELECT
- 4.pg_size_pretty(sum(size)) AS total_wal_size,
- 5.count(*) AS wal_file_count
- 6.FROM pg_ls_waldir();
- 7.
` - 8.**If the subscriber is permanently gone, drop the slot":
- 9.```sql
- 10.SELECT pg_drop_replication_slot('my_logical_slot');
-- This will allow WAL cleanup but data on the subscriber is lost -- You will need to re-create the subscription from scratch ```
- 1.**Set a maximum WAL retention for replication slots":
- 2.```sql
- 3.ALTER SYSTEM SET max_slot_wal_keep_size = '10GB';
- 4.SELECT pg_reload_conf();
-- When the slot exceeds this limit, PostgreSQL will invalidate it -- preventing unbounded disk growth ```
- 1.**Recreate the subscription after fixing the lag":
- 2.```sql
- 3.-- On the subscriber
- 4.DROP SUBSCRIPTION my_sub;
-- On the publisher (if slot was dropped) -- Re-create the publication if needed
-- On the subscriber CREATE SUBSCRIPTION my_sub CONNECTION 'host=publisher dbname=mydb user=repl_user password=secret' PUBLICATION my_publication WITH (copy_data = true); ```
- 1.**Monitor slot lag proactively":
- 2.```sql
- 3.-- Create a monitoring query
- 4.SELECT
- 5.slot_name,
- 6.active,
- 7.pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS lag_bytes,
- 8.pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag_size
- 9.FROM pg_replication_slots
- 10.WHERE pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) > 1073741824; -- > 1GB
- 11.
`