Introduction

A process in D state (uninterruptible sleep) is waiting for I/O to complete at the kernel level. Unlike processes in S state (interruptible sleep), D-state processes cannot be killed with kill -9 because they are blocked inside a kernel system call. Common culprits include failing disk drives, hung NFS mounts, stalled SCSI commands, or filesystem locks. The process will remain in D state until the underlying I/O completes or the kernel aborts the operation.

Symptoms

  • ps aux shows process state as D or Dl
  • kill -9 <pid> has no effect - process remains in the process table
  • Load average is elevated due to processes in uninterruptible sleep
  • top shows high %wa (I/O wait) percentage
  • cat /proc/<pid>/wchan shows the kernel function where the process is blocked
  • System becomes progressively unresponsive as D-state processes accumulate

Common Causes

  • NFS server unreachable while client has pending I/O operations
  • Disk drive failing with I/O errors causing SCSI command timeout
  • Storage array (SAN) path failure with no multipath failover
  • FUSE filesystem (sshfs, encfs) connection dropped
  • Kernel bug in storage driver causing infinite wait

Step-by-Step Fix

  1. 1.Identify all D-state processes and what they are waiting on:
  2. 2.```bash
  3. 3.ps -eo pid,ppid,stat,wchan:30,comm | awk '$3 ~ /^D/ {print $0}'
  4. 4.# See the kernel wait channel
  5. 5.cat /proc/<pid>/wchan
  6. 6.# See the full stack trace
  7. 7.cat /proc/<pid>/stack
  8. 8.`
  9. 9.Check for NFS-related hangs:
  10. 10.```bash
  11. 11.mount | grep nfs
  12. 12.nfsstat -c
  13. 13.# Check for retransmissions
  14. 14.cat /proc/net/rpc/nfs | grep "proc2|proc3"
  15. 15.`
  16. 16.Check storage and disk health:
  17. 17.```bash
  18. 18.dmesg | grep -iE "error|timeout|reset|I/O"
  19. 19.cat /sys/block/sda/device/state
  20. 20.lsscsi
  21. 21.smartctl -a /dev/sda
  22. 22.`
  23. 23.Attempt to recover NFS mounts:
  24. 24.```bash
  25. 25.# Force unmount hung NFS (may take time)
  26. 26.sudo umount -f /mnt/nfs
  27. 27.# If that fails, lazy unmount
  28. 28.sudo umount -l /mnt/nfs
  29. 29.# Remount
  30. 30.sudo mount /mnt/nfs
  31. 31.`
  32. 32.Reset the SCSI device to abort pending commands:
  33. 33.```bash
  34. 34.# Find the device for the hung disk
  35. 35.echo 1 | sudo tee /sys/block/sda/device/delete
  36. 36.# Rescan the SCSI bus
  37. 37.echo "- - -" | sudo tee /sys/class/scsi_host/host0/scan
  38. 38.`
  39. 39.If the process cannot be recovered, a reboot may be necessary:
  40. 40.D-state processes cannot be killed. If the kernel cannot complete the I/O, the only resolution is a system reboot. Use echo b | sudo tee /proc/sysrq-trigger for an immediate reboot if the system is fully unresponsive.

Prevention

  • Use NFS mount options soft,timeo=10,retrans=3 instead of hard for non-critical mounts
  • Configure multipath I/O for SAN storage to provide automatic failover
  • Monitor disk SMART data and replace drives showing error trends
  • Set SCSI command timeout: echo 30 | sudo tee /sys/block/sda/device/timeout
  • Use nofail in fstab for network mounts to prevent boot hangs