Introduction

PostgreSQL keeps old WAL segments as long as a replication slot might still need them. That is useful when a standby or CDC consumer falls behind, but it becomes dangerous when the consumer is gone and the slot remains. Disk usage in pg_wal keeps climbing even though regular checkpoints are running, and eventually the primary runs short on space.

Symptoms

  • The pg_wal directory grows continuously and does not shrink after checkpoints
  • Disk alerts fire on the primary even when normal query load looks steady
  • pg_replication_slots shows one or more inactive slots with very old positions
  • A standby, Debezium connector, or logical subscriber was recently removed or broken

Common Causes

  • A CDC connector stopped, but its logical replication slot remained
  • A physical standby was decommissioned without removing its slot
  • The subscriber cannot reconnect after credentials, DNS, or network changes
  • Teams created a slot for testing and never cleaned it up

Step-by-Step Fix

  1. 1.Measure which slot is retaining WAL
  2. 2.Check slot state and compare it with current WAL growth before changing anything.

```sql SELECT slot_name, slot_type, active, restart_lsn, confirmed_flush_lsn, wal_status FROM pg_replication_slots ORDER BY slot_name;

SELECT pg_size_pretty(SUM(size)) AS wal_size FROM pg_ls_waldir(); ```

  1. 1.Confirm whether a real consumer still needs the slot
  2. 2.Match the slot to a running subscriber or connector instead of dropping it on assumption.
sql
SELECT application_name, client_addr, state, sync_state
FROM pg_stat_replication
ORDER BY application_name;
  1. 1.Bring the consumer back if the slot is still valid
  2. 2.If the slot belongs to a real standby or CDC process, restore the consumer first and watch the retained position move forward.
bash
systemctl restart debezium-connect
psql -d postgres -c "SELECT slot_name, active, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;"
  1. 1.Drop the slot only after you confirm it is permanently unused
  2. 2.Removing the stale slot lets PostgreSQL recycle WAL segments again, but do it only when the downstream system is truly retired.
sql
SELECT pg_drop_replication_slot('old_reporting_slot');

Prevention

  • Track which service owns each replication slot and document it
  • Alert on inactive slots and WAL growth together instead of treating them separately
  • Remove slots during standby or CDC decommissioning work, not later
  • Review slot inventory after failovers, migrations, and connector replacements