Fix High Linux Load Average from IO Wait - Not CPU Bound

Introduction

Linux load average includes processes in both running (R) and uninterruptible sleep (D) states. When the load average is high but CPU utilization is low, the bottleneck is typically I/O wait - processes blocked waiting for disk, network filesystem, or storage subsystem responses. This is a critical distinction because adding more CPU will not help; the fix requires addressing the storage layer.

Symptoms

uptime shows load average of 15+ on an 8-core system
top shows %id (idle) above 80% but %wa (wait) above 30%
vmstat 1 shows wa column consistently high
iostat -x 1 shows %util near 100% for one or more disks
Applications are slow but CPU usage graphs look normal

Common Causes

Failing disk with retry operations causing I/O latency
NFS mount to slow or unreachable server
RAID rebuild in progress consuming all I/O bandwidth
Log rotation or backup job creating massive write load
Swap thrashing causing excessive page I/O
Database performing full table scans on spinning disk

Step-by-Step Fix

1.Confirm I/O wait is the bottleneck:
2.```bash
3.vmstat 1 10
4.# Check the 'wa' column - consistently above 20% confirms I/O bottleneck

mpstat -P ALL 1 5 # Check %iowait per CPU core ```

1.Identify which processes are generating I/O:
2.```bash
3.iotop -oP
4.# Shows real-time I/O usage per process

pidstat -d 1 5 # Shows I/O statistics per process

# Find processes with files open on the busy disk for pid in $(pgrep -f "myapp"); do echo "PID $pid:" cat /proc/$pid/io 2>/dev/null done ```

1.Identify which disk is the bottleneck:
2.```bash
3.iostat -x 1 5
4.# Look for high %util, high await (average wait time), and high svctm
5.# await > 50ms indicates a problem
6.`
7.Check for failing disk hardware:
8.```bash
9.dmesg | grep -iE "error|retry|reset|timeout|I/O" | tail -20
10.smartctl -a /dev/sda | grep -E "Reallocated|Pending|Uncorrectable|UDMA"
11.`
12.Check for NFS-related I/O waits:
13.```bash
14.mount | grep nfs
15.nfsiostat
16.nfsstat -c
17.# High retransmission count indicates NFS server issues
18.`
19.Reduce I/O pressure immediately:
20.```bash
21.# Stop non-essential I/O-heavy services
22.sudo systemctl stop backup-service
23.sudo systemctl stop logrotate.timer

# Reduce I/O scheduler queue depth for latency-sensitive workloads echo 32 | sudo tee /sys/block/sda/queue/nr_requests

# Set I/O scheduler to deadline or mq-deadline for better latency echo mq-deadline | sudo tee /sys/block/sda/queue/scheduler ```

Prevention

Monitor I/O wait as a separate metric from CPU usage in your monitoring system
Use SSDs for I/O-intensive workloads; spinning disks should be limited to archival
Configure I/O scheduler appropriate for the workload (bfq for desktop, mq-deadline for servers)
Rate-limit backup and log rotation jobs using ionice or systemd IOWeight=
Use ioping for regular I/O latency benchmarking and alerting on degradation

Fix Linux High Load Average from IO Wait - Not CPU Bound

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Prevention

Share this guide

More Systems Troubleshooting Guides

Fix Android Emulator Not Starting

Fix Linux Process Zombie State

Fix Consul DNS Resolution Wrong

SaltStack Minion Not Connecting

Chef Client Run Failed

Puppet Catalog Compilation Failed