# Redis Cluster Node Failing
Error Messages
CLUSTERDOWN The cluster is downOr:
MOVED 1234 10.0.0.1:6379Or:
ASK 1234 10.0.0.2:6379Or:
Node 10.0.0.1:6379 is not emptyRoot Causes
- 1.Node network partition - Node isolated from cluster
- 2.Master failure without replica - No available replica to promote
- 3.Slot coverage incomplete - Not all slots covered by nodes
- 4.Configuration mismatch - Nodes have conflicting cluster config
- 5.Too many failed nodes - Cluster cannot achieve majority
- 6.Manual resharding errors - Improper slot migration
Diagnosis Steps
Step 1: Check Cluster Status
```bash # Check cluster info redis-cli -c -h <any_node> -p 6379 CLUSTER INFO
# Key fields: # cluster_state:ok/fail # cluster_slots_assigned:16384 # cluster_slots_ok:16384 # cluster_known_nodes:6 # cluster_size:3 ```
Step 2: Check Node Status
```bash # List all nodes redis-cli -c CLUSTER NODES
# Output format: # <node_id> <ip:port> <flags> <master_id> <ping_sent> <pong_recv> <config_epoch> <link_state> <slots> ```
Flags to watch:
- master - Node is a master
- slave - Node is a replica
- fail? - Node is being pinged to check status
- fail - Node is confirmed down
- handshake - New node joining cluster
- noaddr - Node address unknown
Step 3: Check Slot Coverage
```bash # Check which slots each node handles redis-cli -c CLUSTER NODES | grep -E "connected|slots"
# Or use cluster slots command redis-cli -c CLUSTER SLOTS
# Verify all 16384 slots are covered redis-cli -c CLUSTER INFO | grep cluster_slots_assigned ```
Step 4: Test Node Connectivity
# Ping each node individually
for node in node1:6379 node2:6379 node3:6379; do
echo "Testing $node"
redis-cli -h $(echo $node | cut -d: -f1) -p $(echo $node | cut -d: -f2) ping
doneStep 5: Check Cluster Meet Status
```bash # Verify nodes know each other redis-cli -c CLUSTER NODES | grep -c "connected"
# Should equal total expected nodes ```
Solutions
Solution 1: Fix Network Partition
```bash # Check network connectivity between nodes ping <failed_node_ip>
# If node is reachable but marked as fail, force forget redis-cli -c CLUSTER FORGET <failed_node_id>
# Re-add the node redis-cli -c CLUSTER MEET <node_ip> <node_port>
# Wait for cluster to sync sleep 5 redis-cli -c CLUSTER NODES ```
Solution 2: Replace Failed Master with Replica
```bash # Identify failed master redis-cli -c CLUSTER NODES | grep "fail" | grep "master"
# Find replica of failed master redis-cli -c CLUSTER NODES | grep <failed_master_id>
# On the replica node, promote it redis-cli -h <replica_ip> -p <replica_port> CLUSTER FAILOVER FORCE
# Or takeover immediately redis-cli -h <replica_ip> -p <replica_port> CLUSTER FAILOVER TAKEOVER ```
Solution 3: Add New Node to Cluster
```bash # First, ensure new node is empty redis-cli -h <new_node_ip> -p <new_node_port> FLUSHALL redis-cli -h <new_node_ip> -p <new_node_port> CLUSTER RESET HARD
# Meet the cluster redis-cli -c CLUSTER MEET <new_node_ip> <new_node_port>
# Add as replica redis-cli -c CLUSTER REPLICATE <master_node_id> ```
Solution 4: Fix Incomplete Slot Coverage
```bash # Find uncovered slots redis-cli -c CLUSTER INFO | grep "cluster_slots_assigned"
# If less than 16384, find which slots are missing redis-cli -c CLUSTER SLOTS
# Reshard to cover all slots redis-cli --cluster reshard <any_node>:6379
# Example: reshard 1000 slots to a node redis-cli --cluster reshard <node>:6379 --cluster-from all --cluster-to <target_node_id> --cluster-slots 1000 --cluster-yes ```
Solution 5: Rebalance Cluster
```bash # Rebalance slots evenly across masters redis-cli --cluster rebalance <any_node>:6379
# With specific options redis-cli --cluster rebalance <node>:6379 \ --cluster-weight <node1_id>=1 <node2_id>=1 <node3_id>=1 \ --cluster-use-empty-masters \ --cluster-yes ```
Solution 6: Fix Stalled Resharding
If resharding is interrupted:
```bash # Check for importing/exporting slots redis-cli -c CLUSTER NODES | grep -E "[.*->"
# Cancel stalled import/export redis-cli -c CLUSTER SETSLOT <slot> STABLE
# Or reset the node and re-reshard redis-cli -h <node_ip> -p <node_port> CLUSTER RESET SOFT ```
Solution 7: Handle Majority Loss
When majority of masters are down:
```bash # Check how many masters are down redis-cli -c CLUSTER NODES | grep "fail" | grep "master" | wc -l
# If majority lost and cannot recover: # Reset cluster (WARNING: loses all data) redis-cli -h <node_ip> -p <node_port> CLUSTER RESET HARD
# Recreate cluster redis-cli --cluster create <node1>:6379 <node2>:6379 <node3>:6379 \ <node4>:6379 <node5>:6379 <node6>:6379 \ --cluster-replicas 1 ```
Solution 8: Fix Configuration Epoch Issues
```bash # Check epoch values redis-cli -c CLUSTER NODES
# If epochs are inconsistent, force update redis-cli -c CLUSTER BUMPEPOCH
# Or on specific node redis-cli -h <node_ip> -p <node_port> CLUSTER BUMPEPOCH ```
Common Scenarios
Scenario: Node Marked as FAIL but is Reachable
```bash # Node is up but marked as fail (network partition resolved) # Wait for cluster to auto-recover sleep 30 redis-cli -c CLUSTER NODES
# If still marked as fail, manually forget and re-meet redis-cli -c CLUSTER FORGET <node_id> redis-cli -c CLUSTER MEET <node_ip> <node_port> ```
Scenario: Slots Migration Stuck
```bash # Check slot migration status redis-cli -c CLUSTER NODES
# Look for slots with migration state: [1234->-] # Or import state: [1234-<-node_id]
# Complete the migration manually redis-cli -c CLUSTER SETSLOT <slot> NODE <target_node_id>
# On source node redis-cli -h <source_ip> -p <source_port> CLUSTER SETSLOT <slot> NODE <target_node_id>
# On target node redis-cli -h <target_ip> -p <target_port> CLUSTER SETSLOT <slot> NODE <target_node_id> ```
Scenario: Cluster is Down (CLUSTERDOWN)
```bash # Check state redis-cli -c CLUSTER INFO
# If cluster_state:fail, find the cause: # 1. Check slot coverage # 2. Check master availability # 3. Check majority
# Quick fix for missing slots redis-cli --cluster fix <any_node>:6379
# Or with more aggressive repair redis-cli --cluster fix <any_node>:6379 --cluster-searchmultipleowners ```
Cluster Management Commands
```bash # Create cluster redis-cli --cluster create node1:6379 node2:6379 node3:6379 node4:6379 node5:6379 node6:6379 --cluster-replicas 1
# Add node redis-cli --cluster add-node new_node:6379 existing_node:6379
# Add node as replica redis-cli --cluster add-node new_node:6379 existing_node:6379 --cluster-slave --cluster-master-id <master_id>
# Remove node redis-cli --cluster del-node node:6379 <node_id>
# Reshard redis-cli --cluster reshard node:6379
# Rebalance redis-cli --cluster rebalance node:6379
# Check cluster redis-cli --cluster check node:6379
# Fix cluster redis-cli --cluster fix node:6379
# Info redis-cli --cluster info node:6379 ```
Monitoring Script
```bash #!/bin/bash # redis_cluster_monitor.sh
NODE="localhost:6379"
# Get cluster state STATE=$(redis-cli -c -h ${NODE%:*} -p ${NODE#*:} CLUSTER INFO | grep cluster_state | cut -d: -f2 | tr -d '\r')
if [ "$STATE" != "ok" ]; then echo "CRITICAL: Cluster state is $STATE" redis-cli -c -h ${NODE%:*} -p ${NODE#*:} CLUSTER INFO exit 2 fi
# Check slot coverage SLOTS=$(redis-cli -c -h ${NODE%:*} -p ${NODE#*:} CLUSTER INFO | grep cluster_slots_assigned | cut -d: -f2 | tr -d '\r')
if [ "$SLOTS" != "16384" ]; then echo "WARNING: Only $SLOTS slots covered" exit 1 fi
# Check failed nodes FAILED=$(redis-cli -c -h ${NODE%:*} -p ${NODE#*:} CLUSTER NODES | grep -c "fail")
if [ "$FAILED" -gt 0 ]; then echo "WARNING: $FAILED nodes marked as fail" exit 1 fi
echo "OK: Cluster healthy" exit 0 ```
Prevention
1. Proper Cluster Configuration
# Recommended: 3 masters + 3 replicas minimum
redis-cli --cluster create \
master1:6379 master2:6379 master3:6379 \
replica1:6379 replica2:6379 replica3:6379 \
--cluster-replicas 12. Monitor Cluster Health
# Set up regular monitoring
redis-cli -c CLUSTER INFO | grep cluster_state3. Balanced Slot Distribution
# After adding nodes, rebalance
redis-cli --cluster rebalance <node>:63794. Document Node IDs and Roles
Keep documentation of: - Node IDs - Master-replica relationships - Slot assignments - IP addresses and ports
Related Errors
- [Redis Replication Broken](./fix-redis-replication-broken)
- [Redis Connection Refused](./fix-redis-connection-refused)