What's Actually Happening
PostgreSQL WAL archiving gets stuck when the archive_command fails repeatedly. WAL files accumulate in pg_wal directory, potentially filling disk space and preventing new transactions. Replication and point-in-time recovery are blocked.
The Error You'll See
WAL files accumulating:
```bash $ ls -la /var/lib/postgresql/data/pg_wal/
total 52428800 -rw------- 1 postgres postgres 16777216 Apr 16 00:42 000000010000000000000001 -rw------- 1 postgres postgres 16777216 Apr 16 00:42 000000010000000000000002 -rw------- 1 postgres postgres 16777216 Apr 16 00:42 000000010000000000000003 # ... many more files, not being archived ```
PostgreSQL logs:
```bash $ tail -f /var/log/postgresql/postgresql.log
2026-04-16 00:42:00 LOG: archive command failed with exit code 1 2026-04-16 00:42:00 DETAIL: The failed archive command was: cp pg_wal/000000010000000000000001 /archive/000000010000000000000001 2026-04-16 00:42:00 LOG: archiving write-ahead log file "000000010000000000000001" failed too many times ```
Archive status files:
```bash $ ls /var/lib/postgresql/data/pg_wal/archive_status/
000000010000000000000001.done # Should be created after successful archive 000000010000000000000002.ready # Waiting to be archived 000000010000000000000003.ready # Waiting to be archived # Many .ready files = archiving stuck ```
Why This Happens
- 1.Archive command failure - Command returns non-zero exit code
- 2.Archive destination full - No disk space at archive location
- 3.Permission denied - Cannot write to archive directory
- 4.Network failure - Remote archive server unreachable
- 5.Command timeout - Archive command takes too long
- 6.Archive command missing - archive_command not configured
Step 1: Check Current Archive Status
```bash # Check WAL files waiting to be archived ls /var/lib/postgresql/data/pg_wal/archive_status/*.ready | wc -l
# List all WAL files ls -la /var/lib/postgresql/data/pg_wal/
# Check archive_command setting psql -c "SHOW archive_command"
# Check archive_mode psql -c "SHOW archive_mode"
# Check current WAL position psql -c "SELECT pg_current_wal_lsn()"
# Check pg_stat_archiver for failures psql -c "SELECT * FROM pg_stat_archiver"
# Look for last_failed_time and last_failed_wal ```
Step 2: Check Archive Command
```bash # View current archive_command psql -c "SHOW archive_command"
# Example commands: archive_command = 'cp %p /archive/%f' archive_command = 'rsync %p archive-server:/archive/%f' archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'
# Test the command manually # %p = full path to WAL file # %f = WAL file name
# Test with actual file: cp /var/lib/postgresql/data/pg_wal/000000010000000000000001 /archive/000000010000000000000001
# Check exit code echo $? # Should be 0
# If command fails, diagnose: cp -v /var/lib/postgresql/data/pg_wal/000000010000000000000001 /archive/ ```
Step 3: Check Archive Destination
```bash # Check disk space at archive location df -h /archive/
# For remote archive: ssh archive-server "df -h /archive/"
# Check directory exists ls -la /archive/
# For remote: ssh archive-server "ls -la /archive/"
# Check write permissions touch /archive/test.tmp && rm /archive/test.tmp
# Check PostgreSQL user can write sudo -u postgres touch /archive/test.tmp
# Check existing archived files ls -la /archive/ | grep "00000001" ```
Step 4: Fix Permission Issues
```bash # Check archive directory ownership ls -la /archive/
# Should be writable by postgres user chown postgres:postgres /archive/ chmod 750 /archive/
# For NFS mount, check export permissions showmount -e archive-server
# Check NFS mount options mount | grep archive
# Fix NFS permissions on server # /etc/exports on archive-server: /archive postgres-server(rw,sync,no_root_squash)
# Reload NFS exports exportfs -ra
# Remount if needed mount -o remount /archive ```
Step 5: Fix Disk Space Issues
```bash # If archive destination is full:
# Check what's consuming space du -sh /archive/* du -sh /archive/
# Remove old archived WAL files (only if you have backups!) find /archive/ -name "0000000*" -mtime +30 -delete
# Or move to another location mv /archive/0000000* /backup-archive/
# Compress old archives gzip /archive/00000001000000000000000*
# Increase storage # For LVM: lvextend -L +10G /dev/vg0/archive resize2fs /dev/vg0/archive
# Check WAL retention settings psql -c "SHOW wal_keep_size" ```
Step 6: Fix Network Issues
```bash # For remote archiving, check connectivity
# Test SSH connection ssh archive-server "echo OK"
# Check network latency ping -c 5 archive-server
# Check SSH key authentication ssh -v archive-server
# Test rsync manually rsync /var/lib/postgresql/data/pg_wal/000000010000000000000001 archive-server:/archive/
# Check firewall iptables -L -n -v | grep archive-server
# Allow SSH traffic iptables -I INPUT -s archive-server -p tcp --dport 22 -j ACCEPT
# Check SSH key permissions ls -la ~/.ssh/ chmod 600 ~/.ssh/id_rsa chmod 644 ~/.ssh/id_rsa.pub ```
Step 7: Fix Archive Command Configuration
```sql # In postgresql.conf:
# Simple local archive: archive_command = 'cp %p /archive/%f'
# With existence check (prevents overwrite): archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'
# Using rsync for remote: archive_command = 'rsync -a %p archive-server:/archive/%f'
# Using scp: archive_command = 'scp %p archive-server:/archive/%f'
# With compression: archive_command = 'gzip -c %p > /archive/%f.gz'
# Using WAL-E or pgBackRest (recommended for production): archive_command = 'wal-e wal-push %p' archive_command = 'pgbackrest archive-push %p'
# Reload configuration psql -c "SELECT pg_reload_conf()"
# Or restart systemctl restart postgresql ```
Step 8: Clear Stuck Archives
```bash # If archiving is completely stuck:
# Check failed WAL files psql -c "SELECT last_failed_wal, last_failed_time FROM pg_stat_archiver"
# Option 1: Manually archive the stuck file cp /var/lib/postgresql/data/pg_wal/000000010000000000000001 /archive/
# Create .done file to tell PostgreSQL it's archived touch /var/lib/postgresql/data/pg_wal/archive_status/000000010000000000000001.done
# PostgreSQL will move to next WAL file
# Option 2: Remove .ready file if WAL is already archived elsewhere rm /var/lib/postgresql/data/pg_wal/archive_status/000000010000000000000001.ready
# Option 3: Use pg_archivecleanup to remove old files pg_archivecleanup /var/lib/postgresql/data/pg_wal 000000010000000000000005
# This removes WAL files older than the specified one ```
Step 9: Monitor Archive Progress
```bash # Check archive progress psql -c " SELECT archived_count, last_archived_wal, last_archived_time, failed_count, last_failed_wal, last_failed_time FROM pg_stat_archiver "
# Monitor WAL files watch -n 5 'ls /var/lib/postgresql/data/pg_wal/archive_status/*.ready | wc -l'
# Monitor disk usage watch -n 10 'df -h /var/lib/postgresql/data/pg_wal && df -h /archive'
# Check PostgreSQL logs for archive messages tail -f /var/log/postgresql/postgresql.log | grep -i archive
# Create monitoring script cat << 'EOF' > /usr/local/bin/monitor_wal_archive.sh #!/bin/bash READY=$(ls /var/lib/postgresql/data/pg_wal/archive_status/*.ready 2>/dev/null | wc -l) DONE=$(ls /var/lib/postgresql/data/pg_wal/archive_status/*.done 2>/dev/null | wc -l) WAL_SIZE=$(du -sh /var/lib/postgresql/data/pg_wal | cut -f1) echo "Ready to archive: $READY" echo "Archived: $DONE" echo "WAL directory size: $WAL_SIZE" EOF
chmod +x /usr/local/bin/monitor_wal_archive.sh ```
Step 10: Set Up WAL Compression and Retention
```sql # In postgresql.conf:
# Compress full-page writes wal_compression = on
# Limit WAL keep size (PostgreSQL 13+) wal_keep_size = 1GB
# Or older version: wal_keep_segments = 64
# Set archive timeout (forces archive even if not full) archive_timeout = 300 # 5 minutes
# This ensures regular archiving even during low activity
# Check max_wal_size max_wal_size = 2GB
# Reload settings SELECT pg_reload_conf(); ```
PostgreSQL WAL Archive Checklist
| Check | Command | Expected |
|---|---|---|
| Archive command | SHOW archive_command | Valid command |
| Archive mode | SHOW archive_mode | on |
| Disk space | df -h /archive | Sufficient space |
| Permissions | ls -la /archive | postgres writable |
| Ready files | ls *.ready | Few or none |
| Stat archiver | pg_stat_archiver | Low failed_count |
Verify the Fix
```bash # After fixing archive issues
# 1. Check ready files decreasing ls /var/lib/postgresql/data/pg_wal/archive_status/*.ready | wc -l // Should be low (0-5)
# 2. Monitor archiving success tail -f /var/log/postgresql/postgresql.log | grep "archiving write-ahead log" // Should show successful archiving
# 3. Check pg_stat_archiver psql -c "SELECT archived_count, last_archived_wal FROM pg_stat_archiver" // archived_count should be increasing
# 4. Verify WAL files in archive ls /archive/ | grep "0000000" // Should show archived files
# 5. Check WAL directory size du -sh /var/lib/postgresql/data/pg_wal // Should be stable, not growing
# 6. Test point-in-time recovery capability // Archived files available for recovery ```
Related Issues
- [Fix PostgreSQL Replication Slot Retention](/articles/fix-postgresql-replication-slot-retention)
- [Fix PostgreSQL WAL Disk Full](/articles/fix-postgresql-wal-disk-full)
- [Fix PostgreSQL Backup Failed](/articles/fix-postgresql-backup-failed)