Home / PostgreSQL / PostgreSQL Checkpoint Happening Too Frequently Causing I/O Spikes

PostgreSQL

PostgreSQL Checkpoint Happening Too Frequently Causing I/O Spikes

How to fix PostgreSQL checkpoint storms that cause periodic I/O spikes and write latency degradation.

Yesterday3 min read

Abstract illustration for a troubleshooting knowledge base category.

Introduction PostgreSQL checkpoints flush all dirty buffers to disk. When checkpoints happen too frequently—due to low `max_wal_size` or high write rates—they cause periodic I/O spikes that degrade write latency for all concurrent operations. This is known as the "checkpoint I/O storm" problem.

Symptoms - Write latency spikes every few minutes correlating with checkpoint timing - PostgreSQL logs show `checkpoints are occurring too frequently` warnings - `pg_stat_bgwriter` shows high `checkpoints_timed` or `checkpoints_req` counts - `iostat` shows periodic I/O bursts with high `await` values - Application write operations experience intermittent timeout errors

Common Causes - `max_wal_size` set too low (default 1GB), triggering frequent checkpoints - High write rate filling WAL segments faster than the checkpoint interval - `checkpoint_timeout` set too low - `checkpoint_completion_target` not set to spread I/O over time - `min_wal_size` set too low, causing WAL shrink after each checkpoint

Step-by-Step Fix 1. **Check current checkpoint frequency": ```sql SELECT checkpoints_timed, checkpoints_req, checkpoint_write_time, checkpoint_sync_time, buffers_checkpoint, round(extract(epoch from now() - stats_reset)) AS seconds_since_reset, round( (checkpoints_timed + checkpoints_req)::numeric / extract(epoch from now() - stats_reset) * 3600, 1 ) AS checkpoints_per_hour FROM pg_stat_bgwriter; ```

1.**Check for checkpoint warnings in logs":
2.```bash
3.grep "checkpoints are occurring too frequently" /var/log/postgresql/postgresql-*.log
4.`
5.**Increase max_wal_size and tune checkpoint parameters":
6.```sql
7.ALTER SYSTEM SET max_wal_size = '4GB';
8.ALTER SYSTEM SET min_wal_size = '1GB';
9.ALTER SYSTEM SET checkpoint_timeout = '30min';
10.ALTER SYSTEM SET checkpoint_completion_target = 0.9;
11.SELECT pg_reload_conf();
12.`
13.**Monitor checkpoint timing after changes":
14.```sql
15.-- After 1 hour, check new checkpoint rate
16.SELECT
17.checkpoints_timed,
18.checkpoints_req,
19.buffers_checkpoint,
20.round(buffers_checkpoint * 8192.0 / 1073741824, 2) AS gb_written
21.FROM pg_stat_bgwriter;
22.`
23.**Check WAL generation rate to size max_wal_size appropriately":
24.```sql
25.SELECT
26.pg_walfile_name(pg_current_wal_lsn()) AS current_wal_file,
27.pg_size_pretty(pg_wal_lsn_diff(
28.pg_current_wal_lsn(),
29.(SELECT restart_lsn FROM pg_replication_slots LIMIT 1)
30.)) AS wal_since_restart_lsn;
31.`

Prevention - Set `max_wal_size` to 2-4x the amount of WAL generated per checkpoint interval - Set `checkpoint_completion_target = 0.9` to spread checkpoint I/O over 90% of the interval - Monitor `checkpoints_req` ratio—if more than 10% of checkpoints are demand-based, increase `max_wal_size` - Use `pg_stat_bgwriter` metrics in Grafana dashboards - Set `checkpoint_timeout` to at least 15 minutes (30 minutes is typical for production) - Size WAL storage with at least 3x `max_wal_size` for headroom - For write-heavy workloads, consider dedicated fast storage for WAL (`wal_level = replica`)