Introduction
RabbitMQ mirrored queues replicate queue contents across multiple cluster nodes for high availability. When a node becomes unreachable, the mirror synchronization process fails, leaving queues in an inconsistent state. Depending on the failure mode, messages may be lost or duplicated across the remaining nodes.
Symptoms
- RabbitMQ management UI shows queues with missing or stale mirrors
- Logs contain
mirror sync failedornode not reachableerrors - Queue depth differs across nodes for the same mirrored queue
- Consumers connected to the unreachable node stop receiving messages
- Error message:
Mnesia error: {aborted, {no_exists, [queue_mirror_sync]}}
Common Causes
- Network partition isolating a RabbitMQ node from the rest of the cluster
- Node crash during active mirror synchronization, leaving partial state
- Disk failure on mirror node preventing message store writes
- Erlang distribution port (4369/25672) blocked by firewall rules
- Mnesia database corruption on the unreachable node
Step-by-Step Fix
- 1.Check cluster status to identify unreachable nodes: Verify which nodes have dropped out.
- 2.```bash
- 3.rabbitmqctl cluster_status
- 4.
` - 5.Identify affected mirrored queues: List queues with missing mirrors.
- 6.```bash
- 7.rabbitmqctl list_queues name policy slave_pids synchronised_slave_pids --format json
- 8.
` - 9.Force sync the affected queues from the master: Trigger a mirror resync from the master node.
- 10.```bash
- 11.rabbitmqctl sync_queue my-queue-name
- 12.
` - 13.If the node is permanently lost, remove it from the cluster: Clean up the dead node.
- 14.```bash
- 15.rabbitmqctl forget_cluster_node rabbit@dead-node
- 16.
` - 17.Cancel and redeclare mirrors if sync fails repeatedly: Rebuild the mirror from scratch.
- 18.```bash
- 19.rabbitmqctl cancel_sync_queue my-queue-name
- 20.# Then update the ha-policy to trigger a fresh sync
- 21.rabbitmqctl set_policy ha-all ".*" '{"ha-mode":"all"}' --apply-to queues
- 22.
`
Prevention
- Use quorum queues instead of classic mirrored queues for better consistency guarantees (RabbitMQ 3.8+)
- Deploy cluster nodes in the same low-latency network to reduce sync failures
- Monitor mirror synchronization status and alert on queues with unsynced mirrors
- Configure
ha-sync-mode: manualto control when syncs occur during node recovery - Set up partition handling strategy with
cluster_partition_handling = pause_minority - Regularly test node failure and recovery scenarios in staging environments