What's Actually Happening
Corosync cluster fails to form between nodes. Nodes cannot communicate or establish cluster membership.
The Error You'll See
```bash $ corosync-quorumtool -s
Quorum information ------------------ Date: Mon Jan 1 12:00:00 2024 Quorum provider: corosync_votequorum Nodes configured: 3 Nodes expected: 3 Quorate: No ```
No membership:
Membership information
----------------------
Nodeid Rings Address
No membersConnection error:
[TOTEM ] Token has not been received in 1180 msAuthentication error:
[QB ] error: libqb authentication failedWhy This Happens
- 1.Network connectivity - Nodes cannot reach each other
- 2.Configuration mismatch - Different corosync.conf on nodes
- 3.Port blocking - Firewall blocks UDP 5405/5406
- 4.Authentication failure - Wrong cluster key
- 5.Multicast issues - Multicast not working
- 6.Node ID conflict - Duplicate node IDs
- 7.Interface binding - Wrong interface configured
Step 1: Check Corosync Status
```bash # Check corosync service: systemctl status corosync
# Check process: ps aux | grep corosync
# Check quorum: corosync-quorumtool -s
# Check membership: corosync-quorumtool -m
# Check nodes: corosync-quorumtool -l
# Check corosync configuration: cat /etc/corosync/corosync.conf
# Check logs: journalctl -u corosync -f
# Corosync log: tail -f /var/log/corosync/corosync.log
# Check version: corosync -v
# Check running config: corosync-cfgtool -s
# Check ring status: corosync-cfgtool -r ```
Step 2: Check Network Connectivity
```bash # Test connectivity between nodes: ping node2 ping node3
# Check corosync ports: # UDP 5405 - Multicast/Unicast # UDP 5406 - Quorum
# Test ports: nc -zuv node2 5405 nc -zuv node2 5406
# Check firewall: iptables -L -n | grep 5405 ufw status | grep 5405
# Allow corosync ports: iptables -I INPUT -p udp --dport 5405 -j ACCEPT iptables -I INPUT -p udp --dport 5406 -j ACCEPT
# Using ufw: ufw allow 5405/udp ufw allow 5406/udp
# Using firewalld: firewall-cmd --add-port=5405-5406/udp --permanent firewall-cmd --reload
# Check network interface: ip addr show
# Check if interface exists: ip link show eth1
# Check IP address: ip addr show eth1 | grep inet
# Test multicast: # On node1: socat - UDP4-DATAGRAM:239.0.0.1:5405,broadcast
# On node2: socat UDP4-RECV:5405,bind=239.0.0.1 - ```
Step 3: Check Configuration
```bash # View configuration: cat /etc/corosync/corosync.conf
# Compare on all nodes: for node in node1 node2 node3; do echo "=== $node ===" ssh $node "cat /etc/corosync/corosync.conf" done
# Key sections to check: # totem - Transport configuration # nodelist - Node definitions # quorum - Quorum settings # logging - Log configuration
# Verify totem section: corosync-cfgtool -s
# Verify nodelist: corosync-quorumtool -l
# Check bind address matches interface IP: grep bindaddr /etc/corosync/corosync.conf
# Common issues: # 1. Different cluster names # 2. Different node IDs # 3. Wrong bind address # 4. Mismatched transport mode
# Regenerate config: pcs cluster sync
# Or manually copy: scp /etc/corosync/corosync.conf node2:/etc/corosync/ ```
Step 4: Fix Totem Configuration
```bash # Check totem section: corosync-cfgtool -s
# Totem configuration: totem { version: 2 cluster_name: mycluster transport: udpu crypto_cipher: aes256 crypto_hash: sha256 }
# For multicast: totem { transport: udp interface { member { memberaddr: 239.0.0.1 } ringnumber: 0 bindnetaddr: 192.168.1.0 mcastport: 5405 } }
# For unicast (udpu): totem { transport: udpu interface { ringnumber: 0 bindnetaddr: 192.168.1.0 mcastport: 5405 } }
# Check token timeout: # Increase if network slow: token: 10000 # 10 seconds
# Check consensus timeout: consensus: 12000 # Should be 1.2 * token
# Check join timeout: join: 60
# Check miss count: token_retransmits_before_loss_const: 10
# Reload corosync: pcs cluster reload corosync # Or: systemctl reload corosync ```
Step 5: Check Nodelist Configuration
```bash # Check nodelist: corosync-quorumtool -l
# Nodelist configuration: nodelist { node { ring0_addr: node1 nodeid: 1 } node { ring0_addr: node2 nodeid: 2 } node { ring0_addr: node3 nodeid: 3 } }
# Verify node IDs are unique: grep nodeid /etc/corosync/corosync.conf
# Check node addresses resolve: for addr in $(grep ring0_addr /etc/corosync/corosync.conf | awk '{print $3}'); do echo "Testing $addr:" ping -c 2 $addr done
# Check DNS resolution: nslookup node1 dig node1
# Or use IP addresses: ring0_addr: 192.168.1.10
# Check hostname: hostname
# Check /etc/hosts: cat /etc/hosts | grep -E "node1|node2|node3"
# Add missing entries: echo "192.168.1.10 node1" >> /etc/hosts echo "192.168.1.11 node2" >> /etc/hosts ```
Step 6: Check Authentication
```bash # Check authkey: ls -la /etc/corosync/authkey
# Authkey should be same on all nodes: for node in node1 node2 node3; do echo "=== $node ===" ssh $node "md5sum /etc/corosync/authkey" done
# Generate new authkey: corosync-keygen
# Copy to other nodes: scp /etc/corosync/authkey node2:/etc/corosync/ scp /etc/corosync/authkey node3:/etc/corosync/
# Set permissions: chmod 400 /etc/corosync/authkey chown root:root /etc/corosync/authkey
# Or use pcs: pcs cluster auth node1 node2 node3 -u hacluster -p password
# Check crypto configuration: grep -E "crypto_cipher|crypto_hash" /etc/corosync/corosync.conf
# For no encryption (testing): # Remove or comment: # crypto_cipher: none # crypto_hash: none ```
Step 7: Check Quorum Configuration
```bash # Check quorum settings: corosync-quorumtool -s
# Quorum configuration: quorum { provider: corosync_votequorum expected_votes: 2 two_node: 1 wait_for_all: 0 last_man_standing: 1 auto_tie_breaker: 0 }
# For 2-node cluster: quorum { provider: corosync_votequorum two_node: 1 expected_votes: 2 }
# For 3-node cluster: quorum { provider: corosync_votequorum expected_votes: 3 }
# Check expected votes: pcs quorum config
# Update expected votes: pcs quorum update expected_votes 2
# Update two_node: pcs quorum update two_node 1
# Check last_man_standing: grep last_man_standing /etc/corosync/corosync.conf
# Reload: pcs cluster reload corosync ```
Step 8: Debug Cluster Formation
```bash # Enable debug logging: # In corosync.conf: logging { to_logfile: yes logfile: /var/log/corosync/corosync.log debug: on timestamp: on }
# Restart: systemctl restart corosync
# Watch logs: tail -f /var/log/corosync/corosync.log
# Check for specific errors: grep -i "error|fail|token" /var/log/corosync/corosync.log
# Common errors: # 1. "Token has not been received" - Network issue # 2. "Member joined" - Normal # 3. "Member left" - Node crash/network # 4. "Not endorsing" - Quorum issue
# Check ring status: corosync-cfgtool -r
# Expected output: RING ID 0 id = 192.168.1.10 status = ring 0 active with no faults
# Force ring reconfigure: corosync-cfgtool -R
# Check membership: corosync-quorumtool -m
# Check node count: corosync-quorumtool -l ```
Step 9: Fix Common Issues
```bash # Cluster not forming:
# 1. Check all nodes have same config: md5sum /etc/corosync/corosync.conf # Should match on all nodes
# 2. Check authkey matches: md5sum /etc/corosync/authkey # Should match on all nodes
# 3. Check network: ping node2 nc -zuv node2 5405
# 4. Check firewall allows ports: ufw allow 5405-5406/udp
# 5. Restart corosync on all nodes: systemctl restart corosync
# Token timeout errors:
# 1. Increase token timeout: # In corosync.conf: token: 20000
# 2. Check network latency: ping -c 10 node2 | grep rtt
# 3. Check for packet loss: ping -c 100 node2 | grep loss
# Multicast not working:
# 1. Switch to unicast: # Change transport: udp to transport: udpu
# 2. Add unicast addresses: nodelist { node { ring0_addr: node1 nodeid: 1 } }
# Node cannot join:
# 1. Check nodeid unique # 2. Check ring0_addr correct # 3. Check authkey matches # 4. Check cluster_name matches ```
Step 10: Corosync Verification Script
```bash # Create verification script: cat << 'EOF' > /usr/local/bin/check-corosync.sh #!/bin/bash
echo "=== Corosync Service ===" systemctl status corosync 2>/dev/null | head -5 || echo "Service not running"
echo "" echo "=== Process ===" ps aux | grep corosync | grep -v grep || echo "No corosync process"
echo "" echo "=== Quorum Status ===" corosync-quorumtool -s 2>/dev/null || echo "Cannot get quorum status"
echo "" echo "=== Membership ===" corosync-quorumtool -m 2>/dev/null || echo "Cannot get membership"
echo "" echo "=== Ring Status ===" corosync-cfgtool -r 2>/dev/null || echo "Cannot get ring status"
echo "" echo "=== Configuration ===" cat /etc/corosync/corosync.conf 2>/dev/null | head -30 || echo "No config file"
echo "" echo "=== Nodelist ===" grep -A 5 "nodelist" /etc/corosync/corosync.conf 2>/dev/null || echo "No nodelist"
echo "" echo "=== Authkey ===" ls -la /etc/corosync/authkey 2>/dev/null || echo "No authkey"
echo "" echo "=== Network Connectivity ===" for addr in $(grep ring0_addr /etc/corosync/corosync.conf 2>/dev/null | awk '{print $3}'); do echo "Node: $addr" ping -c 2 -W 2 $addr 2>&1 | tail -2 nc -zuv $addr 5405 2>&1 || true done
echo "" echo "=== Firewall ===" iptables -L -n 2>/dev/null | grep 5405 || ufw status 2>/dev/null | grep 5405 || echo "Check firewall manually"
echo "" echo "=== Cluster Nodes ===" pcs status nodes 2>/dev/null || echo "pcs not available"
echo "" echo "=== Recent Logs ===" journalctl -u corosync --no-pager -n 10 2>/dev/null || tail /var/log/corosync/corosync.log 2>/dev/null | tail -10 || echo "No logs"
echo "" echo "=== Recommendations ===" echo "1. Ensure all nodes can reach each other on port 5405/5406" echo "2. Verify corosync.conf is identical on all nodes" echo "3. Check authkey matches on all nodes" echo "4. Allow UDP ports 5405-5406 in firewall" echo "5. Use unicast (udpu) if multicast issues" echo "6. Increase token timeout for slow networks" echo "7. Check node IDs are unique" EOF
chmod +x /usr/local/bin/check-corosync.sh
# Usage: /usr/local/bin/check-corosync.sh ```
Corosync Cluster Formation Checklist
| Check | Expected |
|---|---|
| Service running | corosync active on all nodes |
| Network reachable | All nodes ping each other |
| Ports open | UDP 5405, 5406 allowed |
| Config match | Same corosync.conf on all nodes |
| Authkey match | Same authkey on all nodes |
| Node IDs | Unique per node |
| Quorum | Quorate after formation |
Verify the Fix
```bash # After fixing Corosync cluster
# 1. Check service systemctl status corosync // Active running
# 2. Check quorum corosync-quorumtool -s // Quorate: Yes
# 3. Check membership corosync-quorumtool -m // All nodes listed
# 4. Check ring corosync-cfgtool -r // Ring active with no faults
# 5. Check logs journalctl -u corosync -f // No errors
# 6. Check cluster pcs status // All nodes online ```
Related Issues
- [Fix Pacemaker Resource Not Starting](/articles/fix-pacemaker-resource-not-starting)
- [Fix Keepalived VIP Not Failover](/articles/fix-keepalived-vip-not-failover)
- [Fix etcd Cluster Unhealthy](/articles/fix-etcd-cluster-unhealthy)