Introduction PostgreSQL checkpoints flush all dirty buffers to disk. When checkpoints happen too frequently—due to low `max_wal_size` or high write rates—they cause periodic I/O spikes that degrade write latency for all concurrent operations. This is known as the "checkpoint I/O storm" problem.
Symptoms - Write latency spikes every few minutes correlating with checkpoint timing - PostgreSQL logs show `checkpoints are occurring too frequently` warnings - `pg_stat_bgwriter` shows high `checkpoints_timed` or `checkpoints_req` counts - `iostat` shows periodic I/O bursts with high `await` values - Application write operations experience intermittent timeout errors
Common Causes - `max_wal_size` set too low (default 1GB), triggering frequent checkpoints - High write rate filling WAL segments faster than the checkpoint interval - `checkpoint_timeout` set too low - `checkpoint_completion_target` not set to spread I/O over time - `min_wal_size` set too low, causing WAL shrink after each checkpoint
Step-by-Step Fix 1. **Check current checkpoint frequency": ```sql SELECT checkpoints_timed, checkpoints_req, checkpoint_write_time, checkpoint_sync_time, buffers_checkpoint, round(extract(epoch from now() - stats_reset)) AS seconds_since_reset, round( (checkpoints_timed + checkpoints_req)::numeric / extract(epoch from now() - stats_reset) * 3600, 1 ) AS checkpoints_per_hour FROM pg_stat_bgwriter; ```
- 1.**Check for checkpoint warnings in logs":
- 2.```bash
- 3.grep "checkpoints are occurring too frequently" /var/log/postgresql/postgresql-*.log
- 4.
` - 5.**Increase max_wal_size and tune checkpoint parameters":
- 6.```sql
- 7.ALTER SYSTEM SET max_wal_size = '4GB';
- 8.ALTER SYSTEM SET min_wal_size = '1GB';
- 9.ALTER SYSTEM SET checkpoint_timeout = '30min';
- 10.ALTER SYSTEM SET checkpoint_completion_target = 0.9;
- 11.SELECT pg_reload_conf();
- 12.
` - 13.**Monitor checkpoint timing after changes":
- 14.```sql
- 15.-- After 1 hour, check new checkpoint rate
- 16.SELECT
- 17.checkpoints_timed,
- 18.checkpoints_req,
- 19.buffers_checkpoint,
- 20.round(buffers_checkpoint * 8192.0 / 1073741824, 2) AS gb_written
- 21.FROM pg_stat_bgwriter;
- 22.
` - 23.**Check WAL generation rate to size max_wal_size appropriately":
- 24.```sql
- 25.SELECT
- 26.pg_walfile_name(pg_current_wal_lsn()) AS current_wal_file,
- 27.pg_size_pretty(pg_wal_lsn_diff(
- 28.pg_current_wal_lsn(),
- 29.(SELECT restart_lsn FROM pg_replication_slots LIMIT 1)
- 30.)) AS wal_since_restart_lsn;
- 31.
`