Fix Linux Process Killed Unexpectedly - Complete Troubleshooting Guide

Your application keeps crashing with no error message. Services restart unexpectedly. A long-running job terminates mid-execution. When processes die without clear errors, tracking down the cause requires systematic investigation.

Understanding the Problem

Processes can be killed for many reasons: the OOM killer, segmentation faults, signal handling, resource limits, watchdogs, or explicit termination. Each leaves different traces.

Typical Symptoms

bash

Process 12345 terminated unexpectedly
Job 1 'python script.py' terminated by signal 9 (Killed)
Segmentation fault (core dumped)
Trace/breakpoint trap (core dumped)
Killed

You might notice: - Services restarting frequently - Long-running jobs dying mid-execution - No useful error messages in application logs - Process disappears from ps output without explanation

Diagnosing the Issue

Step 1: Check Kernel Logs for OOM Killer

The most common cause of unexpected kills is the OOM killer:

```bash # Check for OOM killer activity dmesg | grep -i "killed process" dmesg | grep -i "out of memory" dmesg | grep -i "oom"

# Check recent kernel messages journalctl -k --since "1 hour ago" | grep -i -E "(killed|oom|memory)"

# Alternative log locations grep -i "killed process" /var/log/syslog grep -i "out of memory" /var/log/messages ```

OOM killer output looks like: ``Out of memory: Killed process 1842 (java) total-vm:8388608kB, anon-rss:4194304kB

Step 2: Check for Segmentation Faults

```bash # Look for segfaults in kernel logs dmesg | grep -i segfault

# Example output: # python[12345]: segfault at 0 ip 00007f8c4a2b3f91 sp 00007ffc3a8e9a80 error 4 in libpython3.8.so

# Check core dump settings cat /proc/sys/kernel/core_pattern

# Find core dumps find /var/crash -type f -mtime -1 2>/dev/null ls -la /var/lib/systemd/coredump/ ```

Step 3: Check Process Exit Codes and Signals

```bash # If running in shell, check last exit code echo $?

# Common signal codes: # 1 - SIGHUP (hangup, terminal closed) # 2 - SIGINT (interrupt, Ctrl+C) # 9 - SIGKILL (forced kill, cannot be caught) # 11 - SIGSEGV (segmentation fault) # 15 - SIGTERM (normal termination request)

# Check systemd service exit codes systemctl status service-name

# Show service logs journalctl -u service-name -n 100 ```

Step 4: Check Resource Limits

```bash # Check limits for a running process cat /proc/$(pidof process-name)/limits

# Check shell limits ulimit -a

# Key limits to check: ulimit -c # Core file size ulimit -d # Data segment size ulimit -f # File size ulimit -n # Open files ulimit -s # Stack size ulimit -t # CPU time ulimit -v # Virtual memory ```

Step 5: Check for Watchdogs and Supervisors

```bash # Check if systemd watchdog is configured systemctl show service-name | grep Watchdog

# Check cron for process monitoring crontab -l grep -r "process" /etc/cron.* ```

Solutions by Cause

Solution 1: OOM Killer Prevention

If the OOM killer is terminating your process:

```bash # Check OOM score of your process cat /proc/$(pidof your-app)/oom_score cat /proc/$(pidof your-app)/oom_score_adj

# Protect the process (lower score = less likely killed) # Range: -1000 (never kill) to 1000 (likely kill) echo -500 > /proc/$(pidof your-app)/oom_score_adj

# For systemd services, add to service file: # [Service] # OOMScoreAdjust=-500

# Or completely protect from OOM: echo -1000 > /proc/$(pidof your-app)/oom_score_adj ```

Adjust system memory settings:

```bash # Reduce swappiness sysctl -w vm.swappiness=10

# Configure overcommit sysctl -w vm.overcommit_memory=2 sysctl -w vm.overcommit_ratio=80

# Make persistent cat >> /etc/sysctl.conf << EOF vm.swappiness = 10 vm.overcommit_memory = 2 vm.overcommit_ratio = 80 EOF ```

Solution 2: Fix Segmentation Faults

Segfaults usually indicate bugs in the application:

```bash # Enable core dumps for debugging ulimit -c unlimited

# Install debug symbols for the application apt-get install package-dbgsym # Debian/Ubuntu debuginfo-install package # RHEL/CentOS

# Run with gdb to catch the crash gdb --args ./your-application # Then type 'run' in gdb, and 'bt' after crash

# Analyze existing core dump gdb /path/to/binary /path/to/core # Then: bt full ```

Common segfault causes: - Null pointer dereference - Buffer overflow - Stack overflow - Use after free - Accessing invalid memory

Solution 3: Adjust Resource Limits

```bash # For the current shell session ulimit -n 65535 # Increase open files ulimit -u 4096 # Increase user processes ulimit -v unlimited # Remove virtual memory limit

# For systemd services, edit the service file: # [Service] # LimitNOFILE=65535 # LimitNPROC=4096 # LimitSIGPENDING=4096

# Create override for existing service systemctl edit service-name

# Add: [Service] LimitNOFILE=65535 LimitNPROC=4096

# Apply systemctl daemon-reload systemctl restart service-name ```

For PAM limits (all users):

```bash # Edit /etc/security/limits.conf echo '* soft nofile 65535' >> /etc/security/limits.conf echo '* hard nofile 65535' >> /etc/security/limits.conf echo '* soft nproc 4096' >> /etc/security/limits.conf echo '* hard nproc 4096' >> /etc/security/limits.conf

# User must log out and back in for changes to take effect ```

Solution 4: Handle Signals Properly

If your application doesn't handle signals gracefully:

```bash # Test if application handles SIGTERM kill -TERM $(pidof your-app)

# Common signals and their meanings: # SIGHUP (1) - Reload config, reopen log files # SIGINT (2) - Interrupt (Ctrl+C) # SIGTERM (15) - Normal termination # SIGKILL (9) - Force kill (cannot be caught)

# For applications that don't handle SIGHUP when terminal closes nohup ./your-app &

# Or use screen/tmux tmux new-session -d -s app './your-app' tmux attach -t app ```

Solution 5: Fix Systemd Watchdog

If systemd watchdog is killing your process:

```bash # Check watchdog settings systemctl show your-service | grep Watchdog

# Disable or extend watchdog timeout systemctl edit your-service

# Add: [Service] WatchdogSec=0 # Disable watchdog # Or extend timeout WatchdogSec=300 # 5 minutes

# Restart service systemctl daemon-reload systemctl restart your-service ```

Solution 6: Fix Timeout Issues

Services killed by timeout:

```bash # Check service timeout systemctl show your-service -p TimeoutStartUSec systemctl show your-service -p TimeoutStopUSec

# Extend timeout systemctl edit your-service

[Service] TimeoutStartSec=300 # 5 minutes to start TimeoutStopSec=60 # 1 minute to stop

# Apply systemctl daemon-reload systemctl restart your-service ```

Monitoring and Alerting

Set up monitoring to catch issues early:

```bash # Monitor OOM killer events cat > /usr/local/bin/oom-monitor.sh << 'EOF' #!/bin/bash while true; do if dmesg | grep -q "killed process"; then logger "OOM Killer Alert: $(dmesg | grep 'killed process' | tail -1)" # Send alert (adjust for your notification system) echo "OOM event detected" | mail -s "OOM Alert" admin@example.com fi sleep 60 done EOF chmod +x /usr/local/bin/oom-monitor.sh

# Run as a systemd service or in background ```

Verification

After implementing fixes, verify process stability:

```bash # Monitor process memory watch -n 1 "ps aux | grep your-app"

# Track process over time pidstat -p $(pidof your-app) 1 10

# Monitor with detailed stats pidstat -p $(pidof your-app) -r -u -d 1

# Check for recent kills dmesg | grep -i killed | tail -10

# Verify limits cat /proc/$(pidof your-app)/limits

# Check OOM score cat /proc/$(pidof your-app)/oom_score_adj ```

Prevention Best Practices

Monitor memory usage trends with tools like Prometheus, Grafana, or Nagios
Set up swap space to provide buffer before OOM
Use cgroups or containers to limit and isolate resource usage
Implement proper signal handling in applications
Configure appropriate timeouts for long-running operations
Use process supervisors (systemd, supervisord) for automatic restarts
Enable core dumps for debugging unexpected crashes
Test applications under memory pressure to identify failure modes

How to Fix Linux Process Killed Unexpectedly

Understanding the Problem

Typical Symptoms

Diagnosing the Issue

Step 1: Check Kernel Logs for OOM Killer

Step 2: Check for Segmentation Faults

Step 3: Check Process Exit Codes and Signals

Step 4: Check Resource Limits

Step 5: Check for Watchdogs and Supervisors

Solutions by Cause

Solution 1: OOM Killer Prevention

Solution 2: Fix Segmentation Faults

Solution 3: Adjust Resource Limits

Solution 4: Handle Signals Properly

Solution 5: Fix Systemd Watchdog

Solution 6: Fix Timeout Issues

Monitoring and Alerting

Verification

Prevention Best Practices

Share this guide

More Linux System Troubleshooting Guides

Linux Thermal Zone Throttling

Linux Hwmon Sensor Not Reading

Linux GPIO Pin Not Accessible

Linux SPI Device Communication Failed

Linux I2C Device Not Found

Linux USB Device Not Mounting