Introduction
RabbitMQ nodes use the Erlang distribution protocol for inter-node communication, which requires all nodes in a cluster to share the same Erlang cookie. This cookie acts as a shared secret for authentication. If a new node has a different cookie -- often due to automated provisioning generating a random cookie -- it cannot join the cluster, resulting in isolated nodes and failed clustering.
Symptoms
- New node logs show
Cookie file contains different cookieorAccess denied rabbitmqctl join_clusterfails with{badrpc, timeout}ornodedown- Node appears in
rabbitmqctl cluster_statusas a separate, unclustered node - Epmd (Erlang Port Mapper Daemon) shows the node is running but not connected
- Error message:
Mnesia could not connect to any running node
Common Causes
- Automated provisioning (Terraform, Ansible) generates a new random cookie for each node
- Cookie file at
/var/lib/rabbitmq/.erlang.cookiehas incorrect permissions (must be 400) - Cookie changed on existing nodes without updating all other nodes
- Docker container starts with a fresh cookie each time the volume is not persisted
- Different RabbitMQ package versions writing the cookie to different locations
Step-by-Step Fix
- 1.Check the cookie on the existing cluster nodes: Read the cookie from a healthy node.
- 2.```bash
- 3.cat /var/lib/rabbitmq/.erlang.cookie
- 4.
` - 5.Stop the RabbitMQ application on the new node: Prepare to update the cookie.
- 6.```bash
- 7.rabbitmqctl stop_app
- 8.
` - 9.Copy the correct cookie to the new node: Ensure the cookie matches the cluster.
- 10.```bash
- 11.echo -n "CLUSTER_SECRET_COOKIE" > /var/lib/rabbitmq/.erlang.cookie
- 12.chmod 400 /var/lib/rabbitmq/.erlang.cookie
- 13.chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie
- 14.
` - 15.Reset the node and rejoin the cluster: Clean the node state and join.
- 16.```bash
- 17.rabbitmqctl reset
- 18.rabbitmqctl join_cluster rabbit@existing-node-1
- 19.rabbitmqctl start_app
- 20.
` - 21.Verify cluster membership: Confirm all nodes are clustered.
- 22.```bash
- 23.rabbitmqctl cluster_status
- 24.
`
Prevention
- Provision the Erlang cookie as part of the infrastructure setup, before RabbitMQ starts
- Store the cookie in a secrets manager (Vault, AWS Secrets Manager) and retrieve during provisioning
- Ensure cookie file permissions are set to 400 and owned by the rabbitmq user
- Use Docker volumes or Kubernetes secrets to persist the cookie across container restarts
- Verify cookie consistency across all nodes during automated cluster setup
- Include cookie validation in health checks that run after node provisioning