Fix RabbitMQ Quorum Queue Split Brain Leader Election

Introduction

RabbitMQ quorum queues use the Raft consensus algorithm to replicate messages across cluster nodes. During a network partition, if the partition divides the cluster such that no single partition has a majority of nodes, the quorum queue cannot elect a leader. This split brain scenario makes the queue unavailable for both publishing and consuming until the partition heals and a majority can be re-established.

Symptoms

Quorum queue reports no leader available in management UI
Producers receive NO_ROUTE or connection errors when publishing to quorum queues
Consumer connections hang waiting for a leader to become available
RabbitMQ logs show Raft election timeout and no leader elected
Error message: Quorum queue my-queue has no leader, operations are blocked

Common Causes

Network partition dividing a 3-node cluster into 1+1+1 or 2+1 without a clear majority
Cloud provider availability zone outage taking down a majority of quorum queue members
Asymmetric network partition where different node pairs have different connectivity
Quorum queue members distributed unevenly across failure domains
Node crashes during active leader election, reducing the available quorum

Step-by-Step Fix

1.Check quorum queue status and leader state: Identify the affected queues.
2.```bash
3.rabbitmqctl list_queues name type state leader
4.`
5.Diagnose the network partition: Verify connectivity between nodes.
6.```bash
7.rabbitmqctl cluster_status
8.# Check which nodes can communicate
9.for node in node1 node2 node3; do
10.rabbitmqctl ping -n rabbit@$node
11.done
12.`
13.After partition heals, wait for automatic leader election: Raft will self-heal when majority is restored.
14.```bash
15.# Monitor election progress
16.rabbitmqctl list_queues name state leader --format table
17.# Wait for state to change from 'no_leader' to 'running'
18.`
19.Force quorum queue recovery if automatic election fails: Use the Raft safety override as last resort.
20.```bash
21.rabbitmqctl eval 'rabbit_raft_registry:force_vote(rabbit@node1, <<"my-queue">>).'
22.`
23.Verify queue consistency after recovery: Check that messages are intact.
24.```bash
25.rabbitmqctl list_queues name messages
26.`

Prevention

Deploy quorum queue members across at least 3 failure domains (nodes, zones, racks)
Use odd-numbered cluster sizes (3, 5, 7) to ensure a clear majority is always possible
Configure quorum_commands_soft_timeout and quorum_commands_hard_timeout appropriately
Monitor quorum queue leader status and alert on no_leader state
Test network partition scenarios in staging to verify quorum queue behavior
Avoid placing all quorum queue members on nodes that share a common network dependency

RabbitMQ Quorum Queue Leader Election Split Brain Scenario

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Prevention

Share this guide

More RabbitMQ Troubleshooting Guides

RabbitMQ Management Plugin Not Accessible Behind Reverse Proxy

RabbitMQ Lazy Queue Messages Not Persisting to Disk

RabbitMQ Exchange Type Mismatch on Bind Attempt

RabbitMQ Erlang Cookie Mismatch Cluster Node Cannot Join

RabbitMQ Federation Link Upstream Certificate Verify Failed

RabbitMQ Message TTL Expired Dead Letter Exchange Not Configured