What's Actually Happening

Pacemaker cluster resource fails to start on any node. Resource remains in Stopped or Failed state despite manual intervention.

The Error You'll See

```bash $ pcs status resources

* Resource Set: webserver * Stopped: [ node1 node2 ] ```

Failed state:

bash
* webserver (ocf::heartbeat:apache): Stopped FAILED

Error message:

bash
Error: Resource webserver is not running on any node

Constraint violation:

bash
Resource webserver cannot run: colocation constraint not satisfied

Why This Happens

  1. 1.Resource agent error - OCF script returns error
  2. 2.Missing dependencies - Required resource not running
  3. 3.Constraint conflicts - Location or colocation rules prevent start
  4. 4.Configuration error - Invalid resource parameters
  5. 5.Permission denied - Resource agent lacks permissions
  6. 6.Network issues - Cannot reach required services
  7. 7.Quorum lost - Cluster has no quorum

Step 1: Check Cluster Status

```bash # Check cluster status: pcs status

# Check cluster nodes: pcs status nodes

# Check corosync status: pcs status corosync

# Check quorum: pcs quorum status

# Check resources: pcs status resources

# Check specific resource: pcs status resource webserver

# Check failed actions: pcs status failed

# Check cluster configuration: pcs config

# Check resource configuration: pcs resource show webserver

# Check logs: journalctl -u pacemaker -f journalctl -u corosync -f ```

Step 2: Check Resource Configuration

```bash # Show resource configuration: pcs resource show webserver

# Example output: Resource: webserver (class=ocf provider=heartbeat type=apache) Attributes: configfile=/etc/httpd/conf/httpd.conf Operations: start interval=0s timeout=40s (webserver-start-interval-0s) stop interval=0s timeout=60s (webserver-stop-interval-0s) monitor interval=10s timeout=20s (webserver-monitor-interval-10s)

# Check resource parameters: pcs resource show webserver --all

# Validate OCF agent: pcs resource describe ocf:heartbeat:apache

# Check resource class: ls -la /usr/lib/ocf/resource.d/heartbeat/apache

# Check agent script: cat /usr/lib/ocf/resource.d/heartbeat/apache | head -50

# Test resource agent manually: export OCF_ROOT=/usr/lib/ocf export OCF_RESKEY_configfile=/etc/httpd/conf/httpd.conf /usr/lib/ocf/resource.d/heartbeat/apache start echo $? # Should be 0

# Check for syntax errors in agent: bash -n /usr/lib/ocf/resource.d/heartbeat/apache

# Update resource parameters: pcs resource update webserver configfile=/etc/httpd/conf/httpd.conf ```

Step 3: Debug Resource Failure

```bash # Check failed actions: pcs status failed

# Clear failed state: pcs resource cleanup webserver

# Check error in logs: grep -i "webserver|error|fail" /var/log/pacemaker.log | tail -30

# Enable debug logging: # In /etc/pacemaker/pacemaker.conf: logging { debug: on }

# Or via pcs: pcs property set stonith-enabled=false pcs property set no-quorum-policy=ignore

# Check operation history: crm_resource --resource webserver --force --cleanup

# Debug mode: crm_simulate -L -x /var/lib/pacemaker/cib/cib.xml

# Test operation: pcs resource debug-start webserver

# Check return code: # OCF return codes: # 0 - Success # 1 - Generic error # 2 - Invalid args # 3 - Unimplemented # 4 - Insufficient privileges # 5 - Not installed # 6 - Not configured # 7 - Not running # 8 - Promoted # 9 - Not promoted ```

Step 4: Check Constraints

```bash # List all constraints: pcs constraint list

# Check location constraints: pcs constraint location show

# Check colocation constraints: pcs constraint colocation show

# Check order constraints: pcs constraint order show

# Check ticket constraints: pcs constraint ticket show

# Find constraints for resource: pcs constraint ref webserver

# Remove constraint: pcs constraint remove constraint-id

# Location constraint example: # pcs constraint location webserver prefers node1=100

# Colocation constraint example: # pcs constraint colocation add webserver with ipaddress

# Order constraint example: # pcs constraint order ipaddress then webserver

# Check for conflicting constraints: pcs constraint location show | grep webserver pcs constraint colocation show | grep webserver

# Remove all constraints for resource: pcs constraint location remove webserver ```

Step 5: Check Resource Dependencies

```bash # Check resource groups: pcs resource group list

# Check group members: pcs resource group show webservers

# Check resource clones: pcs resource clone list

# Check master/slave resources: pcs resource master list

# Check order dependencies: pcs constraint order show | grep webserver

# If resource depends on another, check that resource: pcs status resource ipaddress

# Start dependency first: pcs resource start ipaddress

# Check for circular dependencies: # Use crm_simulate to detect crm_simulate -S -x /var/lib/pacemaker/cib/cib.xml

# Check group ordering: # Resources in group start left to right

# Verify required resources: pcs resource show webserver | grep -i require

# Check for resource sets: pcs constraint show | grep -A5 "resource-set" ```

Step 6: Check Node Status

```bash # Check node status: pcs status nodes

# Expected output: Pacemaker Nodes: Online: node1 node2 Standby: Maintenance: Offline: node3

# Check if node is standby: pcs node standby node1

# Bring node online: pcs node unstandby node1

# Check node attributes: pcs node attribute

# Check node utilization: pcs node utilization

# Check if node in maintenance: pcs node maintenance node1

# Remove from maintenance: pcs node unmaintenance node1

# Check node health: pcs status nodes | grep -i online

# Ensure resource can run on node: # Check location constraint: pcs constraint location show | grep -i node1 ```

Step 7: Check Quorum Status

```bash # Check quorum: pcs quorum status

# Expected output: Quorum information ------------------ Date: Mon Jan 1 12:00:00 2024 Quorum provider: corosync_votequorum Nodes configured: 3 Nodes expected: 3 Quorate: Yes

# If not quorate: # For 2-node cluster, need quorum disk or ignore quorum

# Ignore quorum (for testing): pcs property set no-quorum-policy=ignore

# Check expected votes: pcs quorum config

# Update expected votes: pcs quorum update expected_votes 2

# For 2-node cluster, add two_node: 1 in corosync.conf # /etc/corosync/corosync.conf: quorum { provider: corosync_votequorum two_node: 1 }

# Reload corosync: pcs cluster reload corosync ```

Step 8: Check Stonith Configuration

```bash # Check stonith status: pcs property show stonith-enabled

# List fence devices: pcs stonith list

# Show fence device: pcs stonith show fence_ipmilan

# Test fence device: pcs stonith fence node1

# Check fence configuration: pcs stonith config

# Disable stonith (for testing): pcs property set stonith-enabled=false

# Create fence device: pcs stonith create fence_ipmilan stonith:fence_ipmilan \ ipaddr=192.168.1.100 login=admin passwd=password \ pcmk_hostlist=node1,node2

# Check fence action: pcs stonith fence node1 --off

# Verify fence works: fence_ipmilan -a 192.168.1.100 -l admin -p password -o status

# Check fence level: pcs stonith level ```

Step 9: Fix Common Issues

```bash # Resource won't start:

# 1. Cleanup failed state: pcs resource cleanup webserver

# 2. Check constraints: pcs constraint ref webserver

# 3. Check dependencies: pcs resource show webserver

# 4. Test agent manually: /usr/lib/ocf/resource.d/heartbeat/apache start

# 5. Check logs: journalctl -u pacemaker -n 50 | grep webserver

# Resource keeps failing:

# 1. Check failure timeout: pcs resource show webserver | grep failure-timeout

# 2. Set migration threshold: pcs resource update webserver migration-threshold=3

# 3. Set failure timeout: pcs resource update webserver failure-timeout=60s

# Resource stuck in stopping:

# 1. Force stop: pcs resource stop webserver --force

# 2. Delete from CIB: cibadmin --delete --obj_type resources --crm_xml '<resource id="webserver"/>'

# Resource not moving:

# 1. Check stickiness: pcs resource show webserver | grep stickiness

# 2. Set stickiness: pcs resource update webserver resource-stickiness=100

# Check resource meta: pcs resource update webserver meta target-role=Started ```

Step 10: Pacemaker Verification Script

```bash # Create verification script: cat << 'EOF' > /usr/local/bin/check-pacemaker-resource.sh #!/bin/bash

RESOURCE=${1:-""}

echo "=== Cluster Status ===" pcs status 2>/dev/null || echo "pcs command not available"

echo "" echo "=== Nodes ===" pcs status nodes 2>/dev/null || echo "Cannot get node status"

echo "" echo "=== Quorum ===" pcs quorum status 2>/dev/null || echo "Cannot get quorum status"

echo "" echo "=== Resources ===" pcs status resources 2>/dev/null || echo "Cannot get resource status"

if [ -n "$RESOURCE" ]; then echo "" echo "=== Resource: $RESOURCE ===" pcs resource show $RESOURCE 2>/dev/null || echo "Resource not found"

echo "" echo "=== Constraints for $RESOURCE ===" pcs constraint ref $RESOURCE 2>/dev/null || echo "No constraints" fi

echo "" echo "=== Failed Actions ===" pcs status failed 2>/dev/null || echo "No failed actions"

echo "" echo "=== Stonith Configuration ===" pcs property show stonith-enabled 2>/dev/null || echo "Cannot check stonith"

echo "" echo "=== Cluster Properties ===" pcs property list 2>/dev/null | head -20 || echo "Cannot list properties"

echo "" echo "=== Constraints ===" pcs constraint list 2>/dev/null | head -20 || echo "Cannot list constraints"

echo "" echo "=== Corosync Status ===" pcs status corosync 2>/dev/null || echo "Corosync not configured"

echo "" echo "=== Recent Logs ===" journalctl -u pacemaker --no-pager -n 10 2>/dev/null || echo "No pacemaker logs"

echo "" echo "=== Recommendations ===" echo "1. Ensure cluster has quorum" echo "2. Check resource agent is valid" echo "3. Verify constraints allow resource to run" echo "4. Check dependencies are met" echo "5. Review resource parameters" echo "6. Clear failed state with pcs resource cleanup" echo "7. Check logs for specific error messages" EOF

chmod +x /usr/local/bin/check-pacemaker-resource.sh

# Usage: /usr/local/bin/check-pacemaker-resource.sh webserver ```

Pacemaker Resource Checklist

CheckExpected
Cluster quorumQuorate
Node onlineTarget node available
Resource agentValid OCF script
ConstraintsAllow resource placement
DependenciesRequired resources running
PermissionsAgent can execute
ConfigurationValid parameters

Verify the Fix

```bash # After fixing Pacemaker resource

# 1. Check cluster status pcs status // Cluster online, quorate

# 2. Check resource pcs status resource webserver // Resource Started on node

# 3. Check no failures pcs status failed // No failed actions

# 4. Test resource pcs resource start webserver // Resource started

# 5. Monitor logs journalctl -u pacemaker -f // No errors

# 6. Check constraints pcs constraint ref webserver // Constraints satisfied ```

  • [Fix Corosync Cluster Not Forming](/articles/fix-corosync-cluster-not-forming)
  • [Fix Keepalived VIP Not Failover](/articles/fix-keepalived-vip-not-failover)
  • [Fix HAProxy Backend Down](/articles/fix-haproxy-backend-down)