Introduction When Write-Ahead Log (WAL) or redo log archiving falls behind, log files accumulate on disk until the filesystem reaches 100% capacity. At this point, the database stops accepting writes, and in severe cases, even reads fail. This is a critical production incident.
Symptoms - PostgreSQL reports `ERROR: could not write to file "pg_wal/xlog": No space left on device` - Oracle reports `ORA-00257: archiver error. Connect internal only, until freed` - Database becomes read-only or completely unresponsive - Monitoring alerts show disk usage at 100% on the data partition - `archive_command` in PostgreSQL logs show repeated failures
Common Causes - Archive destination (S3, NFS, backup server) is unreachable or full - `archive_timeout` set too low generating excessive WAL files - Network outage preventing WAL shipping to standby or archive location - Log rotation not configured, causing archived WAL to accumulate on the same disk - Backup process stalled, preventing WAL cleanup by `wal_keep_size`
Step-by-Step Fix 1. **Immediately identify disk usage to find the largest consumers**: ```bash du -sh /var/lib/postgresql/*/pg_wal/* | sort -rh | head -20 df -h /var/lib/postgresql ```
- 1.Temporarily increase WAL retention limit to buy time:
- 2.```sql
- 3.-- Check current WAL usage
- 4.SELECT pg_walfile_name(pg_current_wal_lsn()),
- 5.pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS used_wal
- 6.FROM pg_control_checkpoint();
-- Check oldest required WAL SELECT slot_name, restart_lsn, active FROM pg_replication_slots; ```
- 1.Free space by moving archived WAL to alternative storage:
- 2.```bash
- 3.# Move WAL archives to a temp location on a different disk
- 4.mkdir -p /mnt/backup/pg_wal_archive
- 5.mv /var/lib/postgresql/16/main/pg_wal/archive_status/*.* /mnt/backup/pg_wal_archive/
- 6.
` - 7.Remove inactive replication slots that are holding WAL:
- 8.```sql
- 9.SELECT slot_name, active, restart_lsn FROM pg_replication_slots;
- 10.-- If a slot is inactive and holding WAL:
- 11.SELECT pg_drop_replication_slot('orphaned_standby_slot');
- 12.
` - 13.Fix the archive command and restart archiving:
- 14.```sql
- 15.-- Check archive status
- 16.SELECT * FROM pg_stat_archiver;
-- Verify archive_command is correct SHOW archive_command;
-- Fix and reload ALTER SYSTEM SET archive_command = 'wal-g wal-push %p'; SELECT pg_reload_conf(); ```
- 1.For Oracle, manually archive and delete old redo logs:
- 2.```sql
- 3.-- Check archive destination status
- 4.SELECT dest_id, status, error FROM v$archive_dest;
-- Archive current log ALTER SYSTEM ARCHIVE LOG CURRENT;
-- Delete archived logs older than 2 days using RMAN -- rman target / -- DELETE ARCHIVELOG UNTIL TIME 'SYSDATE-2'; ```