What's Actually Happening

Processes show zombie state (Z) in process list. Zombie processes have completed execution but their parent process has not collected their exit status. They consume no resources but remain in process table.

The Error You'll See

```bash $ ps aux | grep Z

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1234 0.0 0.0 0 0 ? Z 10:00 0:00 [process] <defunct> root 5678 0.0 0.0 0 0 ? Z 10:05 0:00 [another] <defunct> ```

Process shows defunct:

```bash $ top

Tasks: 150 total, 10 running, 130 sleeping, 10 zombie, 0 stopped ```

Cannot kill zombie:

```bash $ kill -9 1234

# Process still shows Z status ```

Why This Happens

  1. 1.Parent not reading exit status - Parent process not calling wait()
  2. 2.Parent process crashed - Parent died without collecting children
  3. 3.Parent process hung - Parent blocked and not handling signals
  4. 4.Bug in parent code - Programming error in signal handling
  5. 5.Large zombie accumulation - Many zombies consuming process table slots

Step 1: Identify Zombie Processes

```bash # List all zombies: ps aux | grep Z

# Or with awk: ps aux | awk '$8 ~ /Z/ {print}'

# List with PPID (parent PID): ps -eo pid,ppid,stat,cmd | grep Z

# Count zombies: ps aux | awk '$8 ~ /Z/' | wc -l

# Use top to see zombie count: top -b -n 1 | head -5

# Check process table size: cat /proc/sys/kernel/pid_max

# Current PID count: ps -e | wc -l

# Check zombie process details: cat /proc/1234/status | grep State # State: Z (zombie) ```

Step 2: Find Parent Process

```bash # Find parent PID: ps -eo pid,ppid,stat,cmd | grep "1234"

# Output: # PID PPID STAT CMD # 1234 1000 Z [process] <defunct>

# PPID 1000 is the parent

# Get parent details: ps -p 1000 -o pid,cmd

# Or: cat /proc/1234/status | grep -E "Pid|PPid"

# List all zombies with parents: ps -eo pid,ppid,stat,cmd | awk '$3 ~ /Z/ {print}'

# Find parent process tree: pstree -p 1000

# Check parent status: ps -p 1000 -o pid,stat,cmd

# If parent is sleeping/waiting, it should handle zombies ```

Step 3: Check Parent Process Status

```bash # Check if parent is running: ps -p 1000 -o pid,stat,cmd

# Status codes: # S - Sleeping (can handle wait) # R - Running (should handle wait) # D - Disk sleep (blocked, cannot handle) # T - Stopped (cannot handle) # Z - Zombie (parent also zombie!)

# Check parent threads: cat /proc/1000/status | grep Threads

# Check if parent blocked: strace -p 1000

# See what parent is doing: cat /proc/1000/wchan

# Output shows kernel function parent is waiting on

# Check parent's signal handling: cat /proc/1000/status | grep Sig

# Check if SIGCHLD blocked: cat /proc/1000/status | grep -E "^Sig.*" | awk '{print $2}' | xxd -r -p | od -An -tu1 ```

Step 4: Send SIGCHLD to Parent

```bash # Signal parent to collect zombie: kill -SIGCHLD 1000

# Or: kill -17 1000

# Check if zombie collected: ps aux | grep 1234

# If still zombie, parent may not be handling SIGCHLD

# Try SIGUSR1 if app has custom handler: kill -SIGUSR1 1000

# Or interrupt: kill -SIGINT 1000

# Check process handling: strace -e trace=signal -p 1000 ```

Step 5: Restart Parent Process

```bash # If parent won't collect zombies, restart it:

# Find parent binary: ps -p 1000 -o cmd

# Check parent service: systemctl status parent-service

# Restart parent: systemctl restart parent-service

# Or kill and restart: kill 1000 systemctl start parent-service

# After restart, zombies should be adopted by init (PID 1) # init will collect them automatically

# Check zombies gone: ps aux | grep Z ```

Step 6: Handle Orphan Zombies

```bash # If parent died, zombies become orphans # Orphans adopted by init (PID 1)

# Find orphan zombies (PPID 1): ps -eo pid,ppid,stat,cmd | awk '$3 ~ /Z/ && $2 == 1 {print}'

# Init should collect these automatically

# If init not collecting, check init process: ps -p 1 -o cmd

# Modern systems use systemd as init: systemctl daemon-reload

# Trigger init to reap zombies: kill -SIGCHLD 1

# Check systemd journal: journalctl -u init

# For older init systems: telinit q ```

Step 7: Kill Parent to Reap Zombies

```bash # If parent is buggy and won't reap, kill it:

# Identify parent: PPID=$(ps -o ppid= -p 1234) echo $PPID

# Kill parent: kill -9 $PPID

# Zombies will be adopted by init and reaped

# Check zombies removed: ps aux | grep 1234

# Note: This will also kill any other children of parent

# For parent that is a service: systemctl stop parent-service systemctl start parent-service ```

Step 8: Use Process Reaper

```bash # Install process reaper tool:

# Using prctl: # Some systems have process reaper

# Using subreaper: prctl --set-child-subreaper 1

# Create custom reaper script: cat << 'EOF' > /usr/local/bin/zombie_reaper.sh #!/bin/bash

while true; do # Find zombies with parent that won't reap zombies=$(ps -eo pid,ppid,stat,cmd | awk '$3 ~ /Z/ {print $1}')

for zpid in $zombies; do ppid=$(ps -o ppid= -p $zpid) parent=$(ps -p $ppid -o cmd=)

# Check if parent is problematic # Optionally signal parent kill -SIGCHLD $ppid 2>/dev/null done

sleep 60 done EOF

chmod +x /usr/local/bin/zombie_reaper.sh

# Run in background: nohup /usr/local/bin/zombie_reaper.sh &

# Or create systemd service: cat << 'EOF' > /etc/systemd/system/zombie-reaper.service [Unit] Description=Zombie Process Reaper

[Service] ExecStart=/usr/local/bin/zombie_reaper.sh Restart=always

[Install] WantedBy=multi-user.target EOF

systemctl enable zombie-reaper systemctl start zombie-reaper ```

Step 9: Fix Parent Code (Development)

```bash # If you control parent process code, fix signal handling:

# In C: #include <signal.h> #include <sys/wait.h>

void handle_sigchld(int sig) { while (waitpid(-1, NULL, WNOHANG) > 0); }

int main() { signal(SIGCHLD, handle_sigchld); // Or use sigaction for better handling struct sigaction sa; sa.sa_handler = handle_sigchld; sa.sa_flags = SA_RESTART | SA_NOCLDSTOP; sigaction(SIGCHLD, &sa, NULL); }

# Double fork to avoid zombies: # Child forks grandchild, child exits immediately # Grandchild adopted by init

# In Python: import signal import os

signal.signal(signal.SIGCHLD, signal.SIG_IGN) # Ignoring SIGCHLD prevents zombies

# Or handle properly: def handle_sigchld(signum, frame): while os.waitpid(-1, os.WNOHANG) > 0: pass

signal.signal(signal.SIGCHLD, handle_sigchld)

# In shell scripts: # Trap SIGCHLD trap 'while wait -n; do:; done' SIGCHLD ```

Step 10: Monitor Zombie Processes

```bash # Create monitoring script: cat << 'EOF' > /usr/local/bin/monitor_zombies.sh #!/bin/bash

LOG=/var/log/zombie_monitor.log

echo "$(date): Checking zombies..." >> $LOG

zombies=$(ps -eo pid,ppid,stat,cmd | awk '$3 ~ /Z/ {print}') count=$(echo "$zombies" | grep -c Z)

if [ $count -gt 0 ]; then echo "$(date): Found $count zombies" >> $LOG echo "$zombies" >> $LOG

# Alert if many zombies if [ $count -gt 10 ]; then echo "$(date): WARNING: $count zombie processes!" >> $LOG # Send alert: mail, slack, etc. fi fi

echo "$(date): Zombie count: $count" >> $LOG EOF

chmod +x /usr/local/bin/monitor_zombies.sh

# Add to cron: crontab -e */5 * * * * /usr/local/bin/monitor_zombies.sh

# Or systemd timer: cat << 'EOF' > /etc/systemd/system/zombie-monitor.service [Unit] Description=Monitor Zombie Processes

[Service] ExecStart=/usr/local/bin/monitor_zombies.sh

[Install] WantedBy=multi-user.target EOF

cat << 'EOF' > /etc/systemd/system/zombie-monitor.timer [Unit] Description=Run zombie monitor every 5 minutes

[Timer] OnCalendar=*:0/5 Persistent=true

[Install] WantedBy=timers.target EOF

systemctl enable zombie-monitor.timer systemctl start zombie-monitor.timer ```

Zombie Process Checklist

CheckCommandExpected
Zombie countps aux \grep ZLow count
Parent PIDps -eo pid,ppidParent identified
Parent statusps -p PPIDRunning/sleeping
SIGCHLD sentkill -SIGCHLDZombies reaped
Service restartsystemctl restartZombies gone
Init adoptionPPID=1Init collects

Verify the Fix

```bash # After fixing zombies

# 1. Check zombie count ps aux | awk '$8 ~ /Z/' | wc -l // 0 or very few

# 2. Check top top -b -n 1 | grep zombie // Tasks: zombie count = 0

# 3. Check process table ps -e | wc -l // Reasonable count

# 4. Verify parent handling strace -e trace=signal -p parent_pid // Shows wait/waitpid calls

# 5. Monitor over time /usr/local/bin/monitor_zombies.sh // No accumulating zombies

# 6. Test new processes # Start new child processes # Verify they don't become zombies ```

  • [Fix Linux Process High CPU](/articles/fix-linux-process-high-cpu)
  • [Fix Linux Process Memory Leak](/articles/fix-linux-process-memory-leak)
  • [Fix Linux System Service Failed](/articles/fix-linux-system-service-failed)