Introduction

RabbitMQ mirrored queues replicate queue contents across multiple cluster nodes for high availability. When a node becomes unreachable, the mirror synchronization process fails, leaving queues in an inconsistent state. Depending on the failure mode, messages may be lost or duplicated across the remaining nodes.

Symptoms

  • RabbitMQ management UI shows queues with missing or stale mirrors
  • Logs contain mirror sync failed or node not reachable errors
  • Queue depth differs across nodes for the same mirrored queue
  • Consumers connected to the unreachable node stop receiving messages
  • Error message: Mnesia error: {aborted, {no_exists, [queue_mirror_sync]}}

Common Causes

  • Network partition isolating a RabbitMQ node from the rest of the cluster
  • Node crash during active mirror synchronization, leaving partial state
  • Disk failure on mirror node preventing message store writes
  • Erlang distribution port (4369/25672) blocked by firewall rules
  • Mnesia database corruption on the unreachable node

Step-by-Step Fix

  1. 1.Check cluster status to identify unreachable nodes: Verify which nodes have dropped out.
  2. 2.```bash
  3. 3.rabbitmqctl cluster_status
  4. 4.`
  5. 5.Identify affected mirrored queues: List queues with missing mirrors.
  6. 6.```bash
  7. 7.rabbitmqctl list_queues name policy slave_pids synchronised_slave_pids --format json
  8. 8.`
  9. 9.Force sync the affected queues from the master: Trigger a mirror resync from the master node.
  10. 10.```bash
  11. 11.rabbitmqctl sync_queue my-queue-name
  12. 12.`
  13. 13.If the node is permanently lost, remove it from the cluster: Clean up the dead node.
  14. 14.```bash
  15. 15.rabbitmqctl forget_cluster_node rabbit@dead-node
  16. 16.`
  17. 17.Cancel and redeclare mirrors if sync fails repeatedly: Rebuild the mirror from scratch.
  18. 18.```bash
  19. 19.rabbitmqctl cancel_sync_queue my-queue-name
  20. 20.# Then update the ha-policy to trigger a fresh sync
  21. 21.rabbitmqctl set_policy ha-all ".*" '{"ha-mode":"all"}' --apply-to queues
  22. 22.`

Prevention

  • Use quorum queues instead of classic mirrored queues for better consistency guarantees (RabbitMQ 3.8+)
  • Deploy cluster nodes in the same low-latency network to reduce sync failures
  • Monitor mirror synchronization status and alert on queues with unsynced mirrors
  • Configure ha-sync-mode: manual to control when syncs occur during node recovery
  • Set up partition handling strategy with cluster_partition_handling = pause_minority
  • Regularly test node failure and recovery scenarios in staging environments