Introduction
Zombie processes are terminated child processes whose exit status has not been collected by their parent via wait(). While zombies themselves consume no memory or CPU, they retain a process table entry and any inherited file descriptors remain open. When thousands of zombies accumulate, the system can exhaust available PIDs and file descriptors, preventing new process creation.
Symptoms
ps aux | grep defunctshows many zombie (Z state) processesfork: retry: Resource temporarily unavailableerrors/proc/sys/kernel/pid_maxlimit approaching maximumlsofshows high file descriptor count from parent process- New processes fail to spawn with
Cannot allocate memory(despite free RAM)
Common Causes
- Parent process does not handle
SIGCHLDsignal to reap children - Application fork-bomb pattern spawning children without wait loops
- Long-running daemon with a buggy child process management implementation
- Container PID namespace issues where init process does not reap orphans
- Python
subprocess.Popenwithoutwait()orcommunicate()calls
Step-by-Step Fix
- 1.Count zombie processes and identify parents:
- 2.```bash
- 3.ps -eo pid,ppid,stat,comm | awk '$3 ~ /^Z/ {print $0}' | head -20
- 4.ps -eo ppid,stat | awk '$2 ~ /^Z/ {print $1}' | sort | uniq -c | sort -rn | head -10
- 5.
` - 6.Check file descriptor usage system-wide:
- 7.```bash
- 8.cat /proc/sys/fs/file-nr
- 9.# Output: allocated free max
- 10.cat /proc/sys/fs/file-max
- 11.
` - 12.Check per-process file descriptor count:
- 13.```bash
- 14.ls /proc/1234/fd/ | wc -l
- 15.# Find the parent PID with most open FDs
- 16.for pid in $(pgrep -f "myapp"); do
- 17.echo "$pid: $(ls /proc/$pid/fd/ 2>/dev/null | wc -l) fds"
- 18.done | sort -t: -k2 -n -r | head -10
- 19.
` - 20.Signal the parent to reap zombies (SIGCHLD):
- 21.```bash
- 22.sudo kill -SIGCHLD <parent-pid>
- 23.
` - 24.If the parent is unresponsive, kill it to let init adopt and reap zombies:
- 25.```bash
- 26.sudo kill -TERM <parent-pid>
- 27.# If it does not respond:
- 28.sudo kill -9 <parent-pid>
- 29.
` - 30.Increase system file descriptor limit as temporary relief:
- 31.```bash
- 32.echo 1048576 | sudo tee /proc/sys/fs/file-max
- 33.sudo sysctl -w fs.file-max=1048576
- 34.
`
Prevention
- Ensure all child process spawning code includes proper
wait()orSIGCHLDhandlers - Use
prctl(PR_SET_CHILD_SUBREAPER, 1)in long-running daemons to adopt orphan processes - Monitor zombie count:
watch 'ps -eo stat | grep -c Z' - Set
ulimit -nlimits in systemd unit files withLimitNOFILE= - Use process supervisors like systemd, supervisord, or runit that properly manage child lifecycle