Introduction PostgreSQL WAL archival ships completed WAL segments to an archive location for point-in-time recovery and standby replication. When the `archive_command` fails due to network issues, WAL segments accumulate on disk and eventually fill the `pg_wal` directory, causing the database to stop accepting writes.
Symptoms - `pg_stat_archiver` shows `failed_count` increasing and `last_failed_wal` growing - PostgreSQL logs show `archive_command failed with exit code 1` - `pg_wal` directory growing beyond normal size - Standby servers falling behind or disconnecting - Disk space alerts on the WAL partition
Common Causes - Network outage to archive destination (S3, NFS, backup server) - Archive server disk full, rejecting new WAL segments - DNS resolution failure for archive hostname - Archive tool (wal-g, barman, pgBackRest) process crashed - Firewall rules blocking archive destination port
Step-by-Step Fix 1. **Check archiver status": ```sql SELECT * FROM pg_stat_archiver; -- Key columns: -- archived_count, last_archived_wal, last_archived_time -- failed_count, last_failed_wal, last_failed_time ```
- 1.**Test the archive command manually":
- 2.```bash
- 3.# Check the configured archive_command
- 4.psql -c "SHOW archive_command;"
# Test it manually touch /tmp/test_wal_segment su - postgres -c "wal-g wal-push /tmp/test_wal_segment" # Or for NFS: su - postgres -c "cp /tmp/test_wal_segment /mnt/archive/test_wal_segment" ```
- 1.**Fix the archive destination and resume archiving":
- 2.```bash
- 3.# For S3-based archiving with wal-g
- 4.# Check connectivity
- 5.aws s3 ls s3://my-wal-archive/
# Check wal-g configuration cat /etc/wal-g/config.json
# Test wal-g wal-g wal-push /var/lib/postgresql/16/main/pg_wal/000000010000000100000001 ```
- 1.**Force archiving of accumulated WAL segments":
- 2.```sql
- 3.-- Force a WAL segment switch
- 4.SELECT pg_switch_wal();
-- Check if archiving resumes SELECT * FROM pg_stat_archiver; ```
- 1.**Implement retry logic in archive_command":
- 2.```ini
- 3.# Instead of a simple cp, use a retry wrapper
- 4.# archive_command = '/usr/local/bin/archive_wal.sh %f %p'
- 5.
` - 6.```bash
- 7.#!/bin/bash
- 8.# /usr/local/bin/archive_wal.sh
- 9.WAL_FILE=$1
- 10.WAL_PATH=$2
- 11.MAX_RETRIES=5
- 12.RETRY_DELAY=10
for i in $(seq 1 $MAX_RETRIES); do wal-g wal-push "$WAL_PATH" && exit 0 echo "Archive attempt $i failed for $WAL_FILE, retrying in ${RETRY_DELAY}s..." sleep $RETRY_DELAY RETRY_DELAY=$((RETRY_DELAY * 2)) done
exit 1 ```