Introduction PostgreSQL logical replication slots retain WAL files until the subscriber acknowledges receipt. If the subscriber disconnects or falls behind, the slot holds WAL indefinitely, eventually consuming all disk space. Unlike physical replication slots, logical slots also retain catalog information, making them more resource-intensive.

Symptoms - `pg_replication_slots` shows `restart_lsn` far behind `confirmed_flush_lsn` - Disk usage on WAL partition growing continuously - `pg_stat_replication` shows no active logical replication connections - Subscriber logs show `could not receive data from WAL stream` errors - Primary disk reaches 100% due to retained WAL files

Common Causes - Subscriber node offline for maintenance longer than WAL retention - Network issue breaking the replication connection - Subscriber unable to apply changes due to constraint violations or conflicts - `max_slot_wal_keep_size` not set, allowing unlimited WAL retention - Large DDL changes on the publisher causing massive replay on subscriber

Step-by-Step Fix 1. **Check replication slot status": ```sql SELECT slot_name, plugin, slot_type, active, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots; ```

  1. 1.**Check the WAL disk usage":
  2. 2.```sql
  3. 3.SELECT
  4. 4.pg_size_pretty(sum(size)) AS total_wal_size,
  5. 5.count(*) AS wal_file_count
  6. 6.FROM pg_ls_waldir();
  7. 7.`
  8. 8.**If the subscriber is permanently gone, drop the slot":
  9. 9.```sql
  10. 10.SELECT pg_drop_replication_slot('my_logical_slot');

-- This will allow WAL cleanup but data on the subscriber is lost -- You will need to re-create the subscription from scratch ```

  1. 1.**Set a maximum WAL retention for replication slots":
  2. 2.```sql
  3. 3.ALTER SYSTEM SET max_slot_wal_keep_size = '10GB';
  4. 4.SELECT pg_reload_conf();

-- When the slot exceeds this limit, PostgreSQL will invalidate it -- preventing unbounded disk growth ```

  1. 1.**Recreate the subscription after fixing the lag":
  2. 2.```sql
  3. 3.-- On the subscriber
  4. 4.DROP SUBSCRIPTION my_sub;

-- On the publisher (if slot was dropped) -- Re-create the publication if needed

-- On the subscriber CREATE SUBSCRIPTION my_sub CONNECTION 'host=publisher dbname=mydb user=repl_user password=secret' PUBLICATION my_publication WITH (copy_data = true); ```

  1. 1.**Monitor slot lag proactively":
  2. 2.```sql
  3. 3.-- Create a monitoring query
  4. 4.SELECT
  5. 5.slot_name,
  6. 6.active,
  7. 7.pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS lag_bytes,
  8. 8.pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag_size
  9. 9.FROM pg_replication_slots
  10. 10.WHERE pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) > 1073741824; -- > 1GB
  11. 11.`

Prevention - Always set `max_slot_wal_keep_size` to prevent unbounded WAL growth - Monitor replication slot lag with alerting at 5GB and 10GB thresholds - Test subscriber failover procedures regularly - Use `pg_replication_slots` in monitoring dashboards - Set up alerting on WAL directory disk usage - For critical subscriptions, deploy subscriber in the same availability zone - Implement automatic slot cleanup for inactive subscribers