What's Actually Happening
Rundeck jobs fail to execute or complete. Job runs show errors instead of successful completion.
The Error You'll See
# In Rundeck UI:
Execution failed: Node step failed against: node1Node connectivity error:
Error: Failed to connect to node: Connection refusedAuthentication error:
Error: Authentication failed for user: rundeckScript error:
Error: Script failed with exit code: 1Permission error:
Error: Access denied: Insufficient permissionsWhy This Happens
- 1.Node unreachable - Target node not accessible
- 2.SSH key issues - Invalid or missing SSH credentials
- 3.Job misconfigured - Wrong options or node filters
- 4.Script errors - Job script fails during execution
- 5.Permission denied - User lacks job execution permission
- 6.Resource exhaustion - Node out of disk, memory, or CPU
Step 1: Check Job Execution Log
```bash # View execution in Rundeck UI: # Activity -> Executions -> Click on failed execution
# Via API: curl -H "Accept: application/json" \ -H "X-Rundeck-Auth-Token: <token>" \ "http://rundeck:4440/api/41/execution/<execution-id>" | jq .
# Get execution output: curl -H "X-Rundeck-Auth-Token: <token>" \ "http://rundeck:4440/api/41/execution/<execution-id>/output" | jq .
# Check execution state: curl -H "X-Rundeck-Auth-Token: <token>" \ "http://rundeck:4440/api/41/execution/<execution-id>/state" | jq .
# View execution log: # In UI: Activity -> Executions -> Execution -> Log Output
# Check for error messages: # Look for "Error:", "Failed:", "Exception:" in logs
# Download execution log: curl -H "X-Rundeck-Auth-Token: <token>" \ "http://rundeck:4440/api/41/execution/<execution-id>/output.txt" ```
Step 2: Check Node Connectivity
```bash # List configured nodes: curl -H "X-Rundeck-Auth-Token: <token>" \ "http://rundeck:4440/api/41/project/myproject/resources" | jq .
# Check node status in UI: # Project -> Nodes -> View node status
# Test node connectivity: # In Rundeck UI: Nodes -> Click node -> Test Connection
# Manual SSH test: ssh rundeck@node1
# Check SSH configuration: cat /etc/rundeck/framework.properties | grep ssh
# SSH key location: cat /var/lib/rundeck/.ssh/id_rsa
# Verify SSH key permissions: ls -la /var/lib/rundeck/.ssh/ # id_rsa should be 600
# Test SSH as rundeck user: sudo -u rundeck ssh node1
# Check SSH config: cat /var/lib/rundeck/.ssh/config
# Node definition in resources.yaml: node1: hostname: node1.example.com username: rundeck ssh-keypath: /var/lib/rundeck/.ssh/id_rsa ```
Step 3: Verify Job Configuration
```bash # View job definition: curl -H "X-Rundeck-Auth-Token: <token>" \ "http://rundeck:4440/api/41/job/<job-id>" | jq .
# Check job options: # In UI: Jobs -> Job -> Edit
# Job node filter: # node filter determines which nodes run the job # Examples: # name: node1 # tags: production # osFamily: linux
# Job steps: # Each step executes in order # Verify step configuration
# Workflow strategy: # node-first: Run all steps on each node # step-first: Run each step on all nodes
# Check job options validation: # Required options must be provided
# Job timeout: # Verify execution timeout is adequate
# Retry configuration: # Check retry count and delay ```
Step 4: Fix SSH Authentication
```bash # Check SSH key: ls -la /var/lib/rundeck/.ssh/id_rsa
# Generate SSH key: sudo -u rundeck ssh-keygen -t rsa -b 4096
# Copy key to target nodes: sudo -u rundeck ssh-copy-id rundeck@node1
# Or manually: cat /var/lib/rundeck/.ssh/id_rsa.pub | ssh rundeck@node1 'cat >> ~/.ssh/authorized_keys'
# Fix permissions: chmod 700 /var/lib/rundeck/.ssh chmod 600 /var/lib/rundeck/.ssh/id_rsa chmod 644 /var/lib/rundeck/.ssh/id_rsa.pub
# Test passwordless SSH: sudo -u rundeck ssh node1 'whoami'
# Use password auth instead: # In node definition: username: rundeck password: ${option.sshPassword}
# SSH config for Rundeck: cat > /var/lib/rundeck/.ssh/config << EOF Host * StrictHostKeyChecking no UserKnownHostsFile /dev/null LogLevel ERROR EOF
chown rundeck:rundeck /var/lib/rundeck/.ssh/config chmod 600 /var/lib/rundeck/.ssh/config ```
Step 5: Check Script Errors
```bash # View script error in execution log: # Activity -> Executions -> Log Output
# Common script errors:
# 1. Command not found: # Ensure command is in PATH or use full path
# 2. Permission denied: # Check execute permission: chmod +x script.sh # Check user has permission to run
# 3. Environment variable missing: # Set in job options or script
# 4. Working directory wrong: # Use absolute paths or cd to correct directory
# Test script manually: ssh node1 sudo -u rundeck /path/to/script.sh
# Check script exit code: /path/to/script.sh; echo $?
# Add error handling to script: #!/bin/bash set -e # Exit on error set -x # Print commands
# Debug script execution: bash -x /path/to/script.sh
# Check script output redirection: # Ensure stdout/stderr captured ```
Step 6: Check User Permissions
```bash # Check user ACL: cat /etc/rundeck/admin.aclpolicy
# User permissions in UI: # User Profile -> Edit Profile -> ACL
# Required permissions: # - job:run # - node:run # - execution:read
# Project ACL: cat /etc/rundeck/project.aclpolicy
# Check group membership: grep rundeck /etc/group
# Test permission via API: curl -H "X-Rundeck-Auth-Token: <token>" \ "http://rundeck:4440/api/41/user/info"
# Add user to admin group: usermod -aG admin rundeck
# Check project role mapping: cat /etc/rundeck/framework.properties | grep role
# Create ACL policy: cat > /etc/rundeck/myproject.aclpolicy << EOF description: Project permissions context: project: myproject for: resource: - equals: kind: job allow: [create,read,update,delete,run] job: - allow: [create,read,update,delete,run] by: group: admin EOF ```
Step 7: Check Node Resources
```bash # Check node resources during execution: ssh node1 'free -m' ssh node1 'df -h' ssh node1 'uptime'
# Disk space: df -h /tmp df -h /var
# Memory: free -m cat /proc/meminfo
# CPU load: uptime top -bn1 | head -20
# Check for zombie processes: ps aux | grep Z
# Check process limits: ulimit -a
# Check logs on node: journalctl -xe tail -f /var/log/syslog
# Check Rundeck temporary files: ls -la /tmp/rundeck/
# Clean up old executions: # In Rundeck UI: Project Settings -> Execution History -> Clean Up ```
Step 8: Fix Environment Issues
```bash # Check environment variables in job: # In script: env | sort
# Set environment in job: export PATH=$PATH:/custom/bin export JAVA_HOME=/usr/lib/jvm/java-11
# Or in job definition: environment: PATH: /usr/local/bin:/usr/bin:/bin MY_VAR: value
# Check Rundeck environment: sudo -u rundeck env
# Environment in framework.properties: framework.properties: framework.server.name = rundeck framework.server.hostname = rundeck.example.com
# Node-specific environment: # In node definition: environment: JAVA_HOME: /usr/lib/jvm/java-11
# Debug environment in execution: echo "PATH=$PATH" echo "HOME=$HOME" ```
Step 9: Handle Job Scheduling Issues
```bash # Check job schedule: curl -H "X-Rundeck-Auth-Token: <token>" \ "http://rundeck:4440/api/41/job/<job-id>/info" | jq .schedule
# Verify schedule in UI: # Jobs -> Job -> Schedule tab
# Quartz schedule syntax: # Seconds Minutes Hours Day-of-Month Month Day-of-Week # 0 0 12 * * ? # Every day at noon
# Test schedule: # Use "Run Job Now" to test manually
# Check scheduler status: curl -H "X-Rundeck-Auth-Token: <token>" \ "http://rundeck:4440/api/41/system/info" | jq .system.scheduler
# Failed scheduled executions: # Check execution history for patterns
# Schedule disabled: # Job -> Edit -> Schedule -> Enabled checkbox
# Concurrent executions: # Check if multiple executions conflict # Set max concurrent executions in job config ```
Step 10: Rundeck Verification Script
```bash # Create verification script: cat << 'EOF' > /usr/local/bin/check-rundeck-job.sh #!/bin/bash
JOB_ID=${1:-""} RD_URL=${2:-"http://localhost:4440"} RD_TOKEN=${3:-""}
echo "=== Rundeck Status ===" curl -s -H "X-Rundeck-Auth-Token: $RD_TOKEN" "$RD_URL/api/41/system/info" | jq '.system'
echo "" echo "=== Project List ===" curl -s -H "X-Rundeck-Auth-Token: $RD_TOKEN" "$RD_URL/api/41/projects" | jq '.project'
echo "" echo "=== Nodes ===" curl -s -H "X-Rundeck-Auth-Token: $RD_TOKEN" "$RD_URL/api/41/project/myproject/resources" | jq 'keys'
echo "" echo "=== Recent Executions ===" curl -s -H "X-Rundeck-Auth-Token: $RD_TOKEN" "$RD_URL/api/41/project/myproject/executions?max=5" | jq '.executions[] | {id, status, dateStarted}'
if [ -n "$JOB_ID" ]; then echo "" echo "=== Job Info ===" curl -s -H "X-Rundeck-Auth-Token: $RD_TOKEN" "$RD_URL/api/41/job/$JOB_ID" | jq '.'
echo "" echo "=== Job Executions ===" curl -s -H "X-Rundeck-Auth-Token: $RD_TOKEN" "$RD_URL/api/41/job/$JOB_ID/executions?max=5" | jq '.executions[] | {id, status, message}' fi
echo "" echo "=== SSH Key Check ===" ls -la /var/lib/rundeck/.ssh/
echo "" echo "=== Recommendations ===" echo "1. Verify node connectivity with SSH test" echo "2. Check SSH key permissions (600)" echo "3. Review job configuration and node filters" echo "4. Test script manually on target node" echo "5. Verify user has execution permissions" echo "6. Check node resources (disk, memory)" echo "7. Review execution log for errors" EOF
chmod +x /usr/local/bin/check-rundeck-job.sh
# Usage: /usr/local/bin/check-rundeck-job.sh <job-id> http://rundeck:4440 <token> ```
Rundeck Job Checklist
| Check | Expected |
|---|---|
| Node connectivity | SSH works |
| SSH key | Valid and correct permissions |
| Job config | Correct options and nodes |
| Script | Executes successfully |
| Permissions | User can run job |
| Node resources | Adequate disk/memory |
| Environment | Variables set correctly |
Verify the Fix
```bash # After fixing Rundeck job issues
# 1. Test node connectivity ssh rundeck@node1 // Connected successfully
# 2. Run job manually # UI: Jobs -> Job -> Run Job Now // Job completes
# 3. Check execution log # Activity -> Executions -> Log Output // No errors
# 4. Verify schedule # Jobs -> Job -> Schedule // Schedule active
# 5. Test SSH key sudo -u rundeck ssh node1 'hostname' // Returns node name
# 6. Check recent executions curl -H "X-Rundeck-Auth-Token: <token>" \ "http://rundeck:4440/api/41/project/myproject/executions?max=5" // Shows successful runs ```
Related Issues
- [Fix Jenkins Build Stuck](/articles/fix-jenkins-build-stuck)
- [Fix Ansible Host Unreachable](/articles/fix-ansible-host-unreachable)
- [Fix SSH Connection Refused](/articles/fix-ssh-connection-refused)