Introduction

Linux systemd service failed to start errors occur when a system service cannot initialize properly due to configuration issues, missing dependencies, resource constraints, or environmental problems. Systemd is the init system for most modern Linux distributions, managing system initialization, service lifecycle, and process supervision. When services fail to start, critical functionality like web servers, databases, networking, and monitoring may be unavailable. Common causes include unit file syntax errors or invalid directives, service dependencies not met or failed, executable path incorrect or missing, configuration file syntax errors in service config, port already in use by another process, resource limits (memory, file descriptors, CPU) exceeded, SELinux/AppArmor blocking service execution, timeout settings too short for service initialization, cgroup resource constraints, and environment variables not set correctly. The fix requires understanding systemd unit file structure, service dependencies, journalctl logging, cgroup resource management, and debugging tools. This guide provides production-proven troubleshooting for systemd service failures across RHEL/CentOS 7/8/9, Ubuntu 18.04/20.04/22.04, Debian 9/10/11, and SUSE Linux Enterprise.

Symptoms

  • systemctl start service returns "Failed to start Service Name"
  • Service status shows "Active: failed (Result: exit-code)"
  • Service starts then immediately stops
  • systemctl status shows dependency failure
  • Service timeout during startup
  • Service works manually but fails via systemd
  • Intermittent startup failures
  • Multiple services failing together
  • Service stuck in "activating" state
  • Journal logs show permission denied or missing files

Common Causes

  • Unit file ExecStart path incorrect
  • Required service dependency not running
  • Configuration file syntax error
  • Port conflict (address already in use)
  • Resource limits (MemoryLimit, NOFILE) exceeded
  • SELinux context or policy blocking
  • WorkingDirectory does not exist
  • User/Group in unit file lacks permissions
  • Environment variables not set
  • TimeoutStartSec too short for service

Step-by-Step Fix

### 1. Diagnose service failure

Check service status:

```bash # Check service status systemctl status service-name

# Output shows: # - Active state (failed, inactive, activating) # - Main PID (if running) # - Control group (cgroup) path # - Recent log entries

# Example failed service: # ● nginx.service - A high performance web server # Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled) # Active: failed (Result: exit-code) since Mon 2024-01-15 10:30:00 UTC # Process: 1234 ExecStart=/usr/sbin/nginx (code=exited, status=1/FAILURE) # Main PID: 1234 (code=exited, status=1/FAILURE) ```

Check recent failures:

```bash # List failed units systemctl --failed

# Show all failed services with details systemctl list-units --state=failed

# Check specific unit state systemctl is-failed service-name

# Get detailed unit properties systemctl show service-name

# Key properties: # - ExecMainStatus: Exit code from service # - ExecMainStartTimestamp: When service started # - ExecMainExitTimestamp: When service stopped # - Result: Failure reason (exit-code, timeout, signal) ```

Analyze service logs:

```bash # View service journal journalctl -u service-name -n 50 --no-pager

# Follow logs in real-time journalctl -u service-name -f

# Show logs with priority (errors only) journalctl -u service-name -p err -n 20

# Include boot information journalctl -u service-name -b

# Check specific time range journalctl -u service-name --since "10 minutes ago" ```

### 2. Fix unit file configuration

Validate unit file syntax:

```bash # Check unit file location systemctl show service-name | grep FragmentPath

# Common locations: # - /usr/lib/systemd/system/ (package installed) # - /etc/systemd/system/ (local overrides) # - /run/systemd/system/ (runtime)

# Validate unit file systemd-analyze verify /usr/lib/systemd/system/service-name.service

# Check for syntax errors # Common issues: # - Missing = in directives # - Invalid section headers # - Unknown directives

# View unit file cat /etc/systemd/system/service-name.service ```

Fix common unit file errors:

```ini # CORRECT unit file structure [Unit] Description=My Application Service After=network.target postgresql.service Wants=postgresql.service Requires=postgresql.service

[Service] Type=simple User=myapp Group=myapp WorkingDirectory=/opt/myapp Environment="APP_ENV=production" EnvironmentFile=/etc/myapp/environment ExecStartPre=/usr/bin/test -f /opt/myapp/config.yml ExecStart=/opt/myapp/bin/myapp --config /opt/myapp/config.yml ExecReload=/bin/kill -HUP $MAINPID Restart=on-failure RestartSec=5 TimeoutStartSec=90 TimeoutStopSec=30

# Resource limits MemoryLimit=512M CPUQuota=50% LimitNOFILE=65536 LimitNPROC=4096

# Security hardening NoNewPrivileges=true ProtectSystem=strict ProtectHome=true ReadWritePaths=/var/log/myapp /opt/myapp/data

[Install] WantedBy=multi-user.target ```

Reload systemd configuration:

```bash # Reload after editing unit files systemctl daemon-reload

# Verify unit is recognized systemctl list-unit-files | grep service-name

# Check unit file state systemctl show service-name | grep LoadState

# LoadState should be: loaded # If not-found: unit file missing # If bad-setting: syntax error in unit file ```

### 3. Fix service dependencies

Check dependency tree:

```bash # Show service dependencies systemctl list-dependencies service-name

# Show what service requires systemctl list-dependencies service-name --after systemctl list-dependencies service-name --before

# Check if dependencies are met systemctl is-active network.target systemctl is-active postgresql.service

# View dependency failures systemctl status service-name | grep -i "dependency" ```

Fix dependency configuration:

```ini # In unit file [Unit] section:

# Soft dependency (start if available) Wants=postgresql.service

# Hard dependency (fail if not available) Requires=postgresql.service

# Order after dependency After=network.target postgresql.service

# Order before dependent services Before=nginx.service

# Conflict detection Conflicts=apache2.service

# Alternative with systemd 227+ BindsTo=postgresql.service After=postgresql.service PartOf=postgresql.service ```

Check circular dependencies:

```bash # Detect circular dependencies systemd-analyze verify service-name.service 2>&1 | grep -i circular

# Show critical chain (startup order) systemd-analyze critical-chain service-name

# Output shows dependency chain with timing: # service-name.service +1.2s # └─postgresql.service +800ms # └─network.target +500ms # └─systemd-journald.socket ```

### 4. Fix executable and path issues

Verify executable exists:

```bash # Get ExecStart path systemctl show service-name | grep ExecStart

# Check if executable exists ExecStart="/opt/myapp/bin/myapp --config /etc/myapp/config.yml" test -x /opt/myapp/bin/myapp && echo "Executable found" || echo "Missing!"

# Check permissions ls -la /opt/myapp/bin/myapp

# Should be executable: # -rwxr-xr-x 1 myapp myapp 12345678 Jan 15 10:00 /opt/myapp/bin/myapp

# Find all referenced paths in unit file grep -E "Exec(Start|Stop|Reload|StartPre|StartPost)" /etc/systemd/system/service-name.service ```

Fix WorkingDirectory issues:

```bash # Check WorkingDirectory exists WorkingDirectory="/opt/myapp" test -d /opt/myapp && echo "Directory exists" || echo "Missing!"

# Create if missing mkdir -p /opt/myapp chown myapp:myapp /opt/myapp chmod 755 /opt/myapp

# Or update unit file with correct path ```

Verify EnvironmentFile:

```bash # Check EnvironmentFile exists EnvironmentFile="/etc/myapp/environment" test -f /etc/myapp/environment && echo "File exists" || echo "Missing!"

# Validate environment file syntax # Should be KEY=VALUE format, one per line cat /etc/myapp/environment

# Correct format: # APP_ENV=production # LOG_LEVEL=info # DATABASE_URL=postgresql://localhost/mydb

# Incorrect (causes failure): # export APP_ENV=production # No 'export' # APP_ENV="production" # Quotes may cause issues # APP_ENV production # Missing = ```

### 5. Fix port conflicts

Check if port is in use:

```bash # Get port from service configuration # Common: ExecStart includes --port 8080 or ListenStreams=8080

# Check if port is bound ss -tlnp | grep :8080 netstat -tlnp | grep :8080

# Or with lsof lsof -i :8080

# If port in use, check which service: # LISTEN 0 128 *:8080 *:* users:(("other-app",pid=5678,fd=3)) ```

Resolve port conflict:

```bash # Option 1: Stop conflicting service systemctl stop other-service systemctl disable other-service

# Option 2: Change service port # Edit service configuration file # Or update ExecStart in unit file

ExecStart=/opt/myapp/bin/myapp --port 8081

# Option 3: Configure socket activation (systemd feature) # Create .socket unit:

# /etc/systemd/system/service-name.socket [Unit] Description=Socket for service-name

[Socket] ListenStream=8080 Accept=no

[Install] WantedBy=sockets.target

# Modify service unit to use socket: # /etc/systemd/system/service-name.service [Service] ExecStart=/opt/myapp/bin/myapp --socket-activated StandardInput=socket

# Enable socket systemctl daemon-reload systemctl enable service-name.socket systemctl start service-name.socket ```

### 6. Fix resource limits

Check current limits:

```bash # Show service resource limits systemctl show service-name | grep -E "(Memory|CPU|Limit)"

# Check cgroup status systemd-cgtop

# View cgroup tree systemd-cgls

# Check if service hit limits journalctl -u service-name | grep -i "limit\|exceeded\|killed" ```

Configure resource limits:

```ini # In unit file [Service] section:

# Memory limit MemoryLimit=512M MemoryHigh=400M MemoryMax=512M

# CPU limits (systemd 230+) CPUQuota=50% CPUWeight=100

# File descriptor limits LimitNOFILE=65536 LimitNPROC=4096

# Core dump limits LimitCORE=infinity

# Process priority Nice=-5 IOSchedulingClass=realtime

#OOMScoreAdjust (lower = less likely to be killed) OOMScoreAdjust=-500

# Example for database service: [Service] MemoryLimit=2G CPUQuota=80% LimitNOFILE=1048576 LimitNPROC=16384 OOMScoreAdjust=-800 ```

Check systemd resource controller:

```bash # Enable systemd resource controller # In /etc/systemd/system.conf DefaultMemoryLimit=512M DefaultCPUQuota=50% DefaultLimitNOFILE=65536

# Reload after changes systemctl daemon-reexec ```

### 7. Fix SELinux/AppArmor issues

Check SELinux status:

```bash # Check SELinux mode getenforce

# Check SELinux status sestatus

# If Enforcing, check for denials ausearch -m avc -ts recent | grep service-name

# Or view audit log grep service-name /var/log/audit/audit.log

# Common denial: # type=AVC msg=avc: denied { execute } for # pid=1234 comm="systemd" name="myapp" # dev="sda1" ino=123456 # scontext=system_u:system_r:init_t:s0 # tcontext=system_u:object_r:usr_t:s0 ```

Fix SELinux context:

```bash # Restore context restorecon -v /opt/myapp/bin/myapp

# Set correct context chcon -t bin_t /opt/myapp/bin/myapp chcon -t etc_t /opt/myapp/config.yml chcon -t var_log_t /var/log/myapp

# Make permanent semanage fcontext -a -t bin_t "/opt/myapp/bin/myapp" semanage fcontext -a -t etc_t "/opt/myapp/config.yml" restorecon -v /opt/myapp/bin/myapp

# Allow service to run setsebool -P domain_can_mmap_files 1

# Or create custom policy ausearch -c 'myapp' --raw | audit2allow -M myapp-custom semodule -i myapp-custom.pp

# Temporarily set permissive (debugging only) setenforce 0 # Test service # Re-enable enforcing setenforce 1 ```

Check AppArmor (Ubuntu/Debian):

```bash # Check AppArmor status systemctl status apparmor

# View AppArmor profiles aa-status

# Check for denials grep service-name /var/log/kern.log | grep DENIED dmesg | grep DENIED | grep service-name

# Create profile aa-genprof /opt/myapp/bin/myapp

# Set profile to complain mode (log only) aa-complain /opt/myapp/bin/myapp

# Or disable profile aa-disable /opt/myapp/bin/myapp ```

### 8. Fix timeout issues

Check timeout settings:

```bash # Show timeout configuration systemctl show service-name | grep -i timeout

# TimeoutStartSec: Time allowed for service to start # TimeoutStopSec: Time allowed for service to stop # TimeoutSec: Both start and stop

# Check if timeout occurred journalctl -u service-name | grep -i "timeout"

# Or check result systemctl show service-name | grep Result # Result=timeout indicates startup timeout ```

Configure timeouts:

```ini # In unit file [Service] section:

# Increase startup timeout (default: 90s) TimeoutStartSec=300

# Increase stop timeout (default: 90s) TimeoutStopSec=120

# Set both TimeoutSec=300

# For slow-starting services (databases, Java apps) TimeoutStartSec=600

# Notify systemd when ready (Type=notify services) # Application must call sd_notify() when ready Type=notify TimeoutStartSec=300

# Or use ExecStartPre to wait for dependencies ExecStartPre=/usr/bin/sleep 30 ```

Watchdog configuration:

```ini # Enable watchdog (systemd monitors service health) WatchdogSec=30

# Service must call sd_notify("WATCHDOG=1") every 15 seconds # (half of WatchdogSec)

# Restart on watchdog timeout Restart=on-watchdog

# Combined example: [Service] Type=notify WatchdogSec=30 Restart=on-watchdog RestartSec=5 ```

### 9. Debug with strace and systemd debug

Trace service startup:

```bash # Get service cgroup systemctl show service-name | grep ControlGroup

# Run service with strace (debug system calls) # Stop service first systemctl stop service-name

# Run manually with strace strace -f -o /tmp/service-strace.log /opt/myapp/bin/myapp --config /etc/myapp/config.yml

# Or attach to running process (if it starts then fails) PID=$(systemctl show service-name --value -p MainPID) strace -f -p $PID -o /tmp/service-strace.log

# Analyze strace output tail -100 /tmp/service-strace.log

# Look for: # - ENOENT: File not found # - EACCES: Permission denied # - EADDRINUSE: Address already in use ```

Enable systemd debug logging:

```bash # Enable debug logging for all services systemctl set-log-level debug

# Or for specific boot # Add to kernel boot parameters: systemd.log_level=debug

# View debug logs journalctl -p debug -n 100

# Reset log level systemctl set-log-level info ```

Run service in foreground:

```bash # Get ExecStart command ExecStart=$(systemctl show service-name --value -p ExecStart)

# Run manually to see output $ExecStart

# Or use systemd-run for temporary service systemd-run --unit=test-service --wait --same-dir /opt/myapp/bin/myapp

# View test service output journalctl -u test-service -f ```

### 10. Monitor service health

Create service monitor:

```bash #!/bin/bash # /usr/local/bin/service-monitor.sh

SERVICE="service-name" ALERT_EMAIL="admin@example.com"

# Check service status if ! systemctl is-active --quiet $SERVICE; then echo "Service $SERVICE is not running!" systemctl status $SERVICE

# Attempt restart systemctl restart $SERVICE

if ! systemctl is-active --quiet $SERVICE; then echo "Service $SERVICE failed to restart!" # Send alert (configure mail) # echo "Service $SERVICE failed" | mail -s "Alert: $SERVICE down" $ALERT_EMAIL fi fi ```

Configure systemd service monitoring:

```ini # In unit file [Service] section:

# Restart on failure Restart=on-failure RestartSec=5

# Or restart always (except manual stop) Restart=always RestartSec=10

# Restart after specific exit codes RestartForceExitStatus=1 2 3 255

# Don't restart on these codes (indicates intentional stop) RestartPreventExitStatus=0 100

# Start delay (prevents restart loops) StartLimitIntervalSec=60 StartLimitBurst=3

# Action when limit hit StartLimitAction=reboot-force ```

Set up systemd service watchdog:

```bash # Install watchdog apt install watchdog # Debian/Ubuntu yum install watchdog # RHEL/CentOS

# Configure cat > /etc/watchdog.conf <<EOF watchdog-device = /dev/watchdog watchdog-timeout = 60 max-load-1 = 24 file = /var/log/syslog change = 1800 EOF

# Enable and start systemctl enable watchdog systemctl start watchdog ```

Prevention

  • Document service dependencies and startup order requirements
  • Test unit file changes in staging before production
  • Configure appropriate timeouts based on service startup time
  • Set resource limits based on observed usage patterns
  • Implement health checks and automatic restart policies
  • Monitor service status with alerting for failures
  • Use configuration management (Ansible, Puppet) for consistency
  • Regular audit of SELinux/AppArmor policies
  • Log service startup times to identify slow-starting services
  • **Result: exit-code**: Service process exited with non-zero status
  • **Result: timeout**: Service failed to start within TimeoutStartSec
  • **Result: signal**: Service was killed by signal (often OOM killer)
  • **Dependency failed**: Required service did not start
  • **Unit not found**: Service unit file missing or not loaded