# PostgreSQL Replication Slots Error - Troubleshooting Guide

Replication slots ensure PostgreSQL retains WAL files until all subscribers have received them. While this prevents replication breakage, mismanaged slots can cause disk space exhaustion and server issues. Let's diagnose and fix replication slot problems.

Understanding Replication Slots

Replication slots come in two types:

  • Physical slots: For streaming replication
  • Logical slots: For logical decoding and replication
sql
-- View all replication slots
SELECT 
    slot_name,
    slot_type,
    plugin,
    database,
    active,
    restart_lsn,
    confirmed_flush_lsn,
    pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS lag_bytes,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag_size
FROM pg_replication_slots;

Common Slot Errors

Error: "all replication slots are in use"

No slots available for new standby or logical replication:

```sql -- Check slot count SELECT count(*) FROM pg_replication_slots;

-- Check max_replication_slots SHOW max_replication_slots;

-- View active vs inactive slots SELECT slot_name, slot_type, active, active_pid FROM pg_replication_slots; ```

Solutions:

```sql -- Option 1: Increase max_replication_slots ALTER SYSTEM SET max_replication_slots = 20; SELECT pg_reload_conf(); -- Requires restart for some versions -- sudo systemctl restart postgresql

-- Option 2: Drop unused slots SELECT slot_name, active FROM pg_replication_slots WHERE NOT active; SELECT pg_drop_replication_slot('unused_slot_name'); ```

Error: "replication slot is active"

Cannot drop or modify a slot that's in use:

```sql -- Check what's using the slot SELECT s.slot_name, s.active, s.active_pid, a.usename, a.client_addr, a.application_name, a.state, a.query FROM pg_replication_slots s LEFT JOIN pg_stat_activity a ON s.active_pid = a.pid WHERE s.slot_name = 'your_slot_name';

-- If process is stuck, terminate it SELECT pg_terminate_backend(active_pid) FROM pg_replication_slots WHERE slot_name = 'your_slot_name';

-- Now drop the slot SELECT pg_drop_replication_slot('your_slot_name'); ```

Error: "slot already exists"

Attempting to create a slot that already exists:

```sql -- Check if slot exists SELECT slot_name, slot_type FROM pg_replication_slots WHERE slot_name = 'my_slot';

-- If physical slot exists but is inactive, drop and recreate SELECT pg_drop_replication_slot('my_slot'); SELECT pg_create_physical_replication_slot('my_slot');

-- For logical slot with plugin SELECT pg_create_logical_replication_slot('my_slot', 'pgoutput'); ```

Orphaned Slots and WAL Accumulation

Orphaned (inactive) slots cause WAL to accumulate until disk fills:

Diagnosing WAL Retention

```sql -- Find slots causing WAL retention SELECT slot_name, slot_type, active, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal_size, EXTRACT(EPOCH FROM (now() - (SELECT pg_stat_file('pg_wal/' || pg_walfile_name(restart_lsn)).modification FROM pg_replication_slots WHERE slot_name = s.slot_name))) / 3600 AS hours_retained FROM pg_replication_slots s ORDER BY pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) DESC;

-- Check total WAL size SELECT pg_size_pretty(sum(size)) AS total_wal_size FROM pg_ls_waldir(); ```

Identifying Orphaned Slots

```sql -- Find slots with no active connection SELECT slot_name, slot_type, active, active_pid, restart_lsn, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag_size FROM pg_replication_slots WHERE NOT active;

-- Check if corresponding replication connection exists SELECT r.slot_name, s.pid, s.client_addr, s.state, CASE WHEN s.pid IS NULL THEN 'ORPHANED' ELSE 'ACTIVE' END AS status FROM pg_replication_slots r LEFT JOIN pg_stat_replication s ON r.slot_name = s.slot_name OR r.active_pid = s.pid WHERE r.slot_type = 'physical'; ```

Cleaning Up Orphaned Slots

```sql -- Drop specific orphaned slot SELECT pg_drop_replication_slot('orphaned_slot_name');

-- Drop all inactive slots (be careful!) DO $$ DECLARE slot RECORD; BEGIN FOR slot IN SELECT slot_name FROM pg_replication_slots WHERE NOT active LOOP EXECUTE 'SELECT pg_drop_replication_slot(' || quote_literal(slot.slot_name) || ')'; RAISE NOTICE 'Dropped slot: %', slot.slot_name; END LOOP; END $$; ```

Slot Creation Errors

Error: "could not create replication slot"

```sql -- Check prerequisites SHOW max_replication_slots; SHOW wal_level;

-- wal_level must be 'replica' or 'logical' ALTER SYSTEM SET wal_level = logical; SELECT pg_reload_conf(); -- Requires restart -- sudo systemctl restart postgresql

-- Check disk space in pg_wal SELECT pg_size_pretty(sum(size)) AS wal_size FROM pg_ls_waldir(); ```

Error: "logical decoding requires wal_level >= logical"

```bash # Edit postgresql.conf sudo nano /etc/postgresql/16/main/postgresql.conf

# Set: wal_level = logical

# Restart PostgreSQL sudo systemctl restart postgresql

# Verify psql -U postgres -c "SHOW wal_level;" ```

Physical Slot Management

Creating Physical Slots

```sql -- Create physical replication slot SELECT pg_create_physical_replication_slot('standby1_slot');

-- Verify creation SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'standby1_slot'; ```

Configuring Standby to Use Slot

```bash # In standby's postgresql.auto.conf primary_slot_name = 'standby1_slot'

# Or during pg_basebackup pg_basebackup -h primary -U replicator -D /var/lib/postgresql/16/main \ -Fp -Xs -P -R -S standby1_slot ```

Dropping Physical Slots

```sql -- Check slot is not active SELECT active, active_pid FROM pg_replication_slots WHERE slot_name = 'standby1_slot';

-- If active, stop the standby first or terminate the connection -- Then drop SELECT pg_drop_replication_slot('standby1_slot'); ```

Logical Slot Management

Creating Logical Slots

```sql -- Create logical slot with pgoutput plugin SELECT pg_create_logical_replication_slot('logical_slot', 'pgoutput');

-- Create with test_decoding plugin SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');

-- List available plugins SELECT * FROM pg_available_extensions WHERE name LIKE '%decoding%'; ```

Peeking at Logical Slot Changes

```sql -- Peek at changes without consuming them SELECT * FROM pg_logical_slot_peek_changes('logical_slot', NULL, NULL);

-- Consume changes SELECT * FROM pg_logical_slot_get_changes('logical_slot', NULL, NULL);

-- Peek with specific LSN SELECT * FROM pg_logical_slot_peek_changes('logical_slot', '0/1500000', NULL); ```

Logical Slot with Replication

```sql -- Create publication first CREATE PUBLICATION my_pub FOR TABLE my_table;

-- Create logical slot (usually done automatically by subscription) SELECT pg_create_logical_replication_slot('sub_slot', 'pgoutput');

-- Create subscription using the slot CREATE SUBSCRIPTION my_sub CONNECTION 'host=primary port=5432 user=replicator password=secret dbname=mydb' PUBLICATION my_pub WITH (create_slot = false, slot_name = 'sub_slot'); ```

Slot Monitoring and Alerts

Create Monitoring Function

```sql CREATE OR REPLACE FUNCTION check_replication_slots() RETURNS TABLE ( slot_name text, slot_type text, active boolean, lag_size text, status text, recommendation text ) LANGUAGE plpgsql AS $$ BEGIN RETURN QUERY SELECT s.slot_name::text, s.slot_type::text, s.active, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), s.restart_lsn))::text AS lag_size, CASE WHEN s.active THEN 'ACTIVE' WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), s.restart_lsn) > 10737418240 THEN 'INACTIVE_HIGH_LAG' ELSE 'INACTIVE' END::text AS status, CASE WHEN s.active THEN 'No action needed' WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), s.restart_lsn) > 10737418240 THEN 'DROP IMMEDIATELY - WAL accumulating' ELSE 'Consider dropping if not needed' END::text AS recommendation FROM pg_replication_slots s; END; $$;

-- Run check SELECT * FROM check_replication_slots(); ```

Alert Query for Monitoring Systems

sql
-- Returns rows if action needed
SELECT 
    slot_name,
    slot_type,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag_size,
    active
FROM pg_replication_slots
WHERE (NOT active AND pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) > 1073741824)  -- 1GB
   OR active IS NULL;

Slot Recovery Scenarios

Scenario: Standby Crashed, Slot Still Exists

```sql -- Check slot status SELECT slot_name, active, active_pid FROM pg_replication_slots WHERE slot_name = 'standby_slot';

-- If standby will not be rebuilt, drop the slot SELECT pg_drop_replication_slot('standby_slot');

-- If standby will be rebuilt, keep the slot -- Rebuild standby using pg_basebackup with the slot name ```

Scenario: Logical Replication Subscription Stuck

```sql -- Check subscription and slot SELECT subname, subslotname, subenabled FROM pg_subscription;

-- Check slot lag SELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag FROM pg_replication_slots WHERE slot_name = (SELECT subslotname FROM pg_subscription WHERE subname = 'my_sub');

-- If subscription is broken, rebuild ALTER SUBSCRIPTION my_sub DISABLE; DROP SUBSCRIPTION my_sub; SELECT pg_drop_replication_slot('old_slot'); -- Recreate subscription ```

Scenario: WAL Disk Full Due to Slot

```bash # Emergency: WAL directory is full du -sh /var/lib/postgresql/16/main/pg_wal/

# Quick fix: Drop the inactive slot (if acceptable data loss) psql -U postgres -c "SELECT pg_drop_replication_slot('blocking_slot');"

# Force checkpoint to clean up psql -U postgres -c "CHECKPOINT;"

# If still full, and you have a recent backup: # Move old WAL files (risky without proper backup) # sudo mv /var/lib/postgresql/16/main/pg_wal/0000000* /tmp/wal_backup/ ```

Best Practices

  1. 1.Name slots meaningfully: Use descriptive names like standby_dc1_slot or reporting_sub_slot
  2. 2.Monitor slots actively: Set up alerts for inactive slots with lag > 1GB
  3. 3.Use slots only when needed: For logical replication, let subscriptions manage slots automatically
  4. 4.Document slot purpose: Maintain documentation of what each slot is for
  5. 5.Clean up after outages: After standby failures, verify slots are either reactivated or dropped
  6. 6.Size wal_keep_size appropriately: As backup for short disconnections
sql
-- Recommended configuration
ALTER SYSTEM SET max_replication_slots = 10;  -- More than needed
ALTER SYSTEM SET wal_keep_size = '2GB';       -- Buffer for reconnections
SELECT pg_reload_conf();

Replication slots are powerful but require active management. An orphaned slot can fill your disk with WAL files, so monitoring is essential for any production PostgreSQL deployment.