Introduction
A process in D state (uninterruptible sleep) is waiting for I/O to complete at the kernel level. Unlike processes in S state (interruptible sleep), D-state processes cannot be killed with kill -9 because they are blocked inside a kernel system call. Common culprits include failing disk drives, hung NFS mounts, stalled SCSI commands, or filesystem locks. The process will remain in D state until the underlying I/O completes or the kernel aborts the operation.
Symptoms
ps auxshows process state asDorDlkill -9 <pid>has no effect - process remains in the process table- Load average is elevated due to processes in uninterruptible sleep
topshows high%wa(I/O wait) percentagecat /proc/<pid>/wchanshows the kernel function where the process is blocked- System becomes progressively unresponsive as D-state processes accumulate
Common Causes
- NFS server unreachable while client has pending I/O operations
- Disk drive failing with I/O errors causing SCSI command timeout
- Storage array (SAN) path failure with no multipath failover
- FUSE filesystem (sshfs, encfs) connection dropped
- Kernel bug in storage driver causing infinite wait
Step-by-Step Fix
- 1.Identify all D-state processes and what they are waiting on:
- 2.```bash
- 3.ps -eo pid,ppid,stat,wchan:30,comm | awk '$3 ~ /^D/ {print $0}'
- 4.# See the kernel wait channel
- 5.cat /proc/<pid>/wchan
- 6.# See the full stack trace
- 7.cat /proc/<pid>/stack
- 8.
` - 9.Check for NFS-related hangs:
- 10.```bash
- 11.mount | grep nfs
- 12.nfsstat -c
- 13.# Check for retransmissions
- 14.cat /proc/net/rpc/nfs | grep "proc2|proc3"
- 15.
` - 16.Check storage and disk health:
- 17.```bash
- 18.dmesg | grep -iE "error|timeout|reset|I/O"
- 19.cat /sys/block/sda/device/state
- 20.lsscsi
- 21.smartctl -a /dev/sda
- 22.
` - 23.Attempt to recover NFS mounts:
- 24.```bash
- 25.# Force unmount hung NFS (may take time)
- 26.sudo umount -f /mnt/nfs
- 27.# If that fails, lazy unmount
- 28.sudo umount -l /mnt/nfs
- 29.# Remount
- 30.sudo mount /mnt/nfs
- 31.
` - 32.Reset the SCSI device to abort pending commands:
- 33.```bash
- 34.# Find the device for the hung disk
- 35.echo 1 | sudo tee /sys/block/sda/device/delete
- 36.# Rescan the SCSI bus
- 37.echo "- - -" | sudo tee /sys/class/scsi_host/host0/scan
- 38.
` - 39.If the process cannot be recovered, a reboot may be necessary:
- 40.D-state processes cannot be killed. If the kernel cannot complete the I/O, the only resolution is a system reboot. Use
echo b | sudo tee /proc/sysrq-triggerfor an immediate reboot if the system is fully unresponsive.
Prevention
- Use NFS mount options
soft,timeo=10,retrans=3instead ofhardfor non-critical mounts - Configure multipath I/O for SAN storage to provide automatic failover
- Monitor disk SMART data and replace drives showing error trends
- Set SCSI command timeout:
echo 30 | sudo tee /sys/block/sda/device/timeout - Use
nofailin fstab for network mounts to prevent boot hangs