Introduction
PostgreSQL keeps old WAL segments as long as a replication slot might still need them. That is useful when a standby or CDC consumer falls behind, but it becomes dangerous when the consumer is gone and the slot remains. Disk usage in pg_wal keeps climbing even though regular checkpoints are running, and eventually the primary runs short on space.
Symptoms
- The
pg_waldirectory grows continuously and does not shrink after checkpoints - Disk alerts fire on the primary even when normal query load looks steady
pg_replication_slotsshows one or more inactive slots with very old positions- A standby, Debezium connector, or logical subscriber was recently removed or broken
Common Causes
- A CDC connector stopped, but its logical replication slot remained
- A physical standby was decommissioned without removing its slot
- The subscriber cannot reconnect after credentials, DNS, or network changes
- Teams created a slot for testing and never cleaned it up
Step-by-Step Fix
- 1.Measure which slot is retaining WAL
- 2.Check slot state and compare it with current WAL growth before changing anything.
```sql SELECT slot_name, slot_type, active, restart_lsn, confirmed_flush_lsn, wal_status FROM pg_replication_slots ORDER BY slot_name;
SELECT pg_size_pretty(SUM(size)) AS wal_size FROM pg_ls_waldir(); ```
- 1.Confirm whether a real consumer still needs the slot
- 2.Match the slot to a running subscriber or connector instead of dropping it on assumption.
SELECT application_name, client_addr, state, sync_state
FROM pg_stat_replication
ORDER BY application_name;- 1.Bring the consumer back if the slot is still valid
- 2.If the slot belongs to a real standby or CDC process, restore the consumer first and watch the retained position move forward.
systemctl restart debezium-connect
psql -d postgres -c "SELECT slot_name, active, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;"- 1.Drop the slot only after you confirm it is permanently unused
- 2.Removing the stale slot lets PostgreSQL recycle WAL segments again, but do it only when the downstream system is truly retired.
SELECT pg_drop_replication_slot('old_reporting_slot');Prevention
- Track which service owns each replication slot and document it
- Alert on inactive slots and WAL growth together instead of treating them separately
- Remove slots during standby or CDC decommissioning work, not later
- Review slot inventory after failovers, migrations, and connector replacements