What's Actually Happening

PostgreSQL WAL archiving gets stuck when the archive_command fails repeatedly. WAL files accumulate in pg_wal directory, potentially filling disk space and preventing new transactions. Replication and point-in-time recovery are blocked.

The Error You'll See

WAL files accumulating:

```bash $ ls -la /var/lib/postgresql/data/pg_wal/

total 52428800 -rw------- 1 postgres postgres 16777216 Apr 16 00:42 000000010000000000000001 -rw------- 1 postgres postgres 16777216 Apr 16 00:42 000000010000000000000002 -rw------- 1 postgres postgres 16777216 Apr 16 00:42 000000010000000000000003 # ... many more files, not being archived ```

PostgreSQL logs:

```bash $ tail -f /var/log/postgresql/postgresql.log

2026-04-16 00:42:00 LOG: archive command failed with exit code 1 2026-04-16 00:42:00 DETAIL: The failed archive command was: cp pg_wal/000000010000000000000001 /archive/000000010000000000000001 2026-04-16 00:42:00 LOG: archiving write-ahead log file "000000010000000000000001" failed too many times ```

Archive status files:

```bash $ ls /var/lib/postgresql/data/pg_wal/archive_status/

000000010000000000000001.done # Should be created after successful archive 000000010000000000000002.ready # Waiting to be archived 000000010000000000000003.ready # Waiting to be archived # Many .ready files = archiving stuck ```

Why This Happens

  1. 1.Archive command failure - Command returns non-zero exit code
  2. 2.Archive destination full - No disk space at archive location
  3. 3.Permission denied - Cannot write to archive directory
  4. 4.Network failure - Remote archive server unreachable
  5. 5.Command timeout - Archive command takes too long
  6. 6.Archive command missing - archive_command not configured

Step 1: Check Current Archive Status

```bash # Check WAL files waiting to be archived ls /var/lib/postgresql/data/pg_wal/archive_status/*.ready | wc -l

# List all WAL files ls -la /var/lib/postgresql/data/pg_wal/

# Check archive_command setting psql -c "SHOW archive_command"

# Check archive_mode psql -c "SHOW archive_mode"

# Check current WAL position psql -c "SELECT pg_current_wal_lsn()"

# Check pg_stat_archiver for failures psql -c "SELECT * FROM pg_stat_archiver"

# Look for last_failed_time and last_failed_wal ```

Step 2: Check Archive Command

```bash # View current archive_command psql -c "SHOW archive_command"

# Example commands: archive_command = 'cp %p /archive/%f' archive_command = 'rsync %p archive-server:/archive/%f' archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'

# Test the command manually # %p = full path to WAL file # %f = WAL file name

# Test with actual file: cp /var/lib/postgresql/data/pg_wal/000000010000000000000001 /archive/000000010000000000000001

# Check exit code echo $? # Should be 0

# If command fails, diagnose: cp -v /var/lib/postgresql/data/pg_wal/000000010000000000000001 /archive/ ```

Step 3: Check Archive Destination

```bash # Check disk space at archive location df -h /archive/

# For remote archive: ssh archive-server "df -h /archive/"

# Check directory exists ls -la /archive/

# For remote: ssh archive-server "ls -la /archive/"

# Check write permissions touch /archive/test.tmp && rm /archive/test.tmp

# Check PostgreSQL user can write sudo -u postgres touch /archive/test.tmp

# Check existing archived files ls -la /archive/ | grep "00000001" ```

Step 4: Fix Permission Issues

```bash # Check archive directory ownership ls -la /archive/

# Should be writable by postgres user chown postgres:postgres /archive/ chmod 750 /archive/

# For NFS mount, check export permissions showmount -e archive-server

# Check NFS mount options mount | grep archive

# Fix NFS permissions on server # /etc/exports on archive-server: /archive postgres-server(rw,sync,no_root_squash)

# Reload NFS exports exportfs -ra

# Remount if needed mount -o remount /archive ```

Step 5: Fix Disk Space Issues

```bash # If archive destination is full:

# Check what's consuming space du -sh /archive/* du -sh /archive/

# Remove old archived WAL files (only if you have backups!) find /archive/ -name "0000000*" -mtime +30 -delete

# Or move to another location mv /archive/0000000* /backup-archive/

# Compress old archives gzip /archive/00000001000000000000000*

# Increase storage # For LVM: lvextend -L +10G /dev/vg0/archive resize2fs /dev/vg0/archive

# Check WAL retention settings psql -c "SHOW wal_keep_size" ```

Step 6: Fix Network Issues

```bash # For remote archiving, check connectivity

# Test SSH connection ssh archive-server "echo OK"

# Check network latency ping -c 5 archive-server

# Check SSH key authentication ssh -v archive-server

# Test rsync manually rsync /var/lib/postgresql/data/pg_wal/000000010000000000000001 archive-server:/archive/

# Check firewall iptables -L -n -v | grep archive-server

# Allow SSH traffic iptables -I INPUT -s archive-server -p tcp --dport 22 -j ACCEPT

# Check SSH key permissions ls -la ~/.ssh/ chmod 600 ~/.ssh/id_rsa chmod 644 ~/.ssh/id_rsa.pub ```

Step 7: Fix Archive Command Configuration

```sql # In postgresql.conf:

# Simple local archive: archive_command = 'cp %p /archive/%f'

# With existence check (prevents overwrite): archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'

# Using rsync for remote: archive_command = 'rsync -a %p archive-server:/archive/%f'

# Using scp: archive_command = 'scp %p archive-server:/archive/%f'

# With compression: archive_command = 'gzip -c %p > /archive/%f.gz'

# Using WAL-E or pgBackRest (recommended for production): archive_command = 'wal-e wal-push %p' archive_command = 'pgbackrest archive-push %p'

# Reload configuration psql -c "SELECT pg_reload_conf()"

# Or restart systemctl restart postgresql ```

Step 8: Clear Stuck Archives

```bash # If archiving is completely stuck:

# Check failed WAL files psql -c "SELECT last_failed_wal, last_failed_time FROM pg_stat_archiver"

# Option 1: Manually archive the stuck file cp /var/lib/postgresql/data/pg_wal/000000010000000000000001 /archive/

# Create .done file to tell PostgreSQL it's archived touch /var/lib/postgresql/data/pg_wal/archive_status/000000010000000000000001.done

# PostgreSQL will move to next WAL file

# Option 2: Remove .ready file if WAL is already archived elsewhere rm /var/lib/postgresql/data/pg_wal/archive_status/000000010000000000000001.ready

# Option 3: Use pg_archivecleanup to remove old files pg_archivecleanup /var/lib/postgresql/data/pg_wal 000000010000000000000005

# This removes WAL files older than the specified one ```

Step 9: Monitor Archive Progress

```bash # Check archive progress psql -c " SELECT archived_count, last_archived_wal, last_archived_time, failed_count, last_failed_wal, last_failed_time FROM pg_stat_archiver "

# Monitor WAL files watch -n 5 'ls /var/lib/postgresql/data/pg_wal/archive_status/*.ready | wc -l'

# Monitor disk usage watch -n 10 'df -h /var/lib/postgresql/data/pg_wal && df -h /archive'

# Check PostgreSQL logs for archive messages tail -f /var/log/postgresql/postgresql.log | grep -i archive

# Create monitoring script cat << 'EOF' > /usr/local/bin/monitor_wal_archive.sh #!/bin/bash READY=$(ls /var/lib/postgresql/data/pg_wal/archive_status/*.ready 2>/dev/null | wc -l) DONE=$(ls /var/lib/postgresql/data/pg_wal/archive_status/*.done 2>/dev/null | wc -l) WAL_SIZE=$(du -sh /var/lib/postgresql/data/pg_wal | cut -f1) echo "Ready to archive: $READY" echo "Archived: $DONE" echo "WAL directory size: $WAL_SIZE" EOF

chmod +x /usr/local/bin/monitor_wal_archive.sh ```

Step 10: Set Up WAL Compression and Retention

```sql # In postgresql.conf:

# Compress full-page writes wal_compression = on

# Limit WAL keep size (PostgreSQL 13+) wal_keep_size = 1GB

# Or older version: wal_keep_segments = 64

# Set archive timeout (forces archive even if not full) archive_timeout = 300 # 5 minutes

# This ensures regular archiving even during low activity

# Check max_wal_size max_wal_size = 2GB

# Reload settings SELECT pg_reload_conf(); ```

PostgreSQL WAL Archive Checklist

CheckCommandExpected
Archive commandSHOW archive_commandValid command
Archive modeSHOW archive_modeon
Disk spacedf -h /archiveSufficient space
Permissionsls -la /archivepostgres writable
Ready filesls *.readyFew or none
Stat archiverpg_stat_archiverLow failed_count

Verify the Fix

```bash # After fixing archive issues

# 1. Check ready files decreasing ls /var/lib/postgresql/data/pg_wal/archive_status/*.ready | wc -l // Should be low (0-5)

# 2. Monitor archiving success tail -f /var/log/postgresql/postgresql.log | grep "archiving write-ahead log" // Should show successful archiving

# 3. Check pg_stat_archiver psql -c "SELECT archived_count, last_archived_wal FROM pg_stat_archiver" // archived_count should be increasing

# 4. Verify WAL files in archive ls /archive/ | grep "0000000" // Should show archived files

# 5. Check WAL directory size du -sh /var/lib/postgresql/data/pg_wal // Should be stable, not growing

# 6. Test point-in-time recovery capability // Archived files available for recovery ```

  • [Fix PostgreSQL Replication Slot Retention](/articles/fix-postgresql-replication-slot-retention)
  • [Fix PostgreSQL WAL Disk Full](/articles/fix-postgresql-wal-disk-full)
  • [Fix PostgreSQL Backup Failed](/articles/fix-postgresql-backup-failed)