The Problem
Resharding a Redis cluster to rebalance slots or add new nodes fails midway. The cluster might show slots in a migrating/importing state, nodes report errors, or the reshard command times out. Incomplete resharding can leave the cluster in an inconsistent state with some keys unreachable.
Understanding Resharding Phases
Resharding involves these steps:
- 1.Target node sends
CLUSTER SETSLOT <slot> IMPORTING <source-node-id> - 2.Source node sends
CLUSTER SETSLOT <slot> MIGRATING <target-node-id> - 3.Keys are moved with
MIGRATEcommand - 4.Slots are updated across cluster
A failure can occur at any phase, leaving different symptoms.
Diagnosing Resharding Failure
Check Slot States
redis-cli -c CLUSTER NODESLook for slots marked with MIGRATING or IMPORTING:
nodeid1 10.0.0.1:6379@16379 myself,master - 0 1609459200000 1 connected 0-5460 [5460->-nodeid2] [5461-<-nodeid3]The bracket notation shows:
- [slot->-target] - Slot migrating out
- [slot-<-source] - Slot importing in
Check for Open Slots
redis-cli -c CLUSTER SLOTSIncomplete slots show migration state in the output.
Identify Stuck Migration
# Check for migrating slots on each master
redis-cli -c CLUSTER NODES | grep -E '\[.*->.*\]'Check Node Connection State
redis-cli -c CLUSTER INFO | grep cluster_stateIf cluster state is fail, resharding cannot proceed.
Common Resharding Errors
Error 1: Connection Timeout
[ERR] Node 10.0.0.5:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.Cause: Target node already has data or cluster config.
Solution: Reset the node:
redis-cli -h <target-node> CLUSTER RESET SOFT
# Or for complete reset
redis-cli -h <target-node> CLUSTER RESET HARDError 2: Slot Already Assigned
[ERR] Slot 5460 is already busyCause: Another resharding operation is in progress or failed mid-way.
Solution: Check which node owns the slot:
redis-cli -c CLUSTER NODES | grep 5460Resolve by completing or rolling back the migration.
Error 3: Key Migration Timeout
[ERR] Timeout migrating keys from 10.0.0.1:6379 to 10.0.0.4:6379Cause: Large keys or slow network.
Solution: Increase timeout and retry:
redis-cli --cluster reshard <node>:6379 --cluster-timeout 60000Error 4: Node Not Reachable
[ERR] Node 10.0.0.5:6379 is not reachableCause: Network issue or node down.
Solution: Verify connectivity:
redis-cli -h 10.0.0.5 -p 6379 PINGRecovery Procedures
Recover from Incomplete Migration
#### Step 1: Identify the Problem Slots
redis-cli -c CLUSTER NODES | grep -E '\['#### Step 2: Check Migration State on Source Node
redis-cli -h <source-node> -p 6379 CLUSTER GETKEYSINSLOT <slot> 100If this returns keys, they weren't migrated.
#### Step 3: Manually Complete Migration
```bash # Get keys in the slot redis-cli -h <source-node> -p 6379 CLUSTER GETKEYSINSLOT <slot> 100
# Migrate each key redis-cli -h <source-node> -p 6379 MIGRATE <target-ip> <target-port> "" 0 5000 KEYS key1 key2 key3 ```
#### Step 4: Finalize Slot Assignment
# On each master node
redis-cli -h <node> -p 6379 CLUSTER SETSLOT <slot> NODE <target-node-id>#### Step 5: Verify Slot State
redis-cli -c CLUSTER SLOTS | grep -A2 <slot>Abort and Roll Back Migration
If you want to abandon the migration:
#### Step 1: Clear Importing State (Target Node)
redis-cli -h <target-node> -p 6379 CLUSTER SETSLOT <slot> STABLE#### Step 2: Clear Migrating State (Source Node)
redis-cli -h <source-node> -p 6379 CLUSTER SETSLOT <slot> STABLE#### Step 3: Verify Stable State
redis-cli -c CLUSTER NODES | grep -E '\['Should return nothing if all migrations cleared.
Use Cluster Fix Command
For automatic recovery:
redis-cli --cluster fix <any-node>:6379This command: - Detects slots claimed by multiple nodes - Reassigns orphan slots - Clears inconsistent states
Add --cluster-search-multiple-owners for thorough checking:
redis-cli --cluster fix <any-node>:6379 --cluster-search-multiple-ownersManual Resharding Step by Step
When --cluster reshard fails repeatedly, manual resharding gives more control:
Step 1: Identify Slots to Move
redis-cli -c CLUSTER NODESStep 2: Set Slot Importing on Target
redis-cli -h <target-node> CLUSTER SETSLOT <slot> IMPORTING <source-node-id>Step 3: Set Slot Migrating on Source
redis-cli -h <source-node> CLUSTER SETSLOT <slot> MIGRATING <target-node-id>Step 4: Migrate Keys
```bash # Get count of keys in slot redis-cli -h <source-node> CLUSTER COUNTKEYSINSLOT <slot>
# Get keys redis-cli -h <source-node> CLUSTER GETKEYSINSLOT <slot> <count>
# Migrate in batches redis-cli -h <source-node> MIGRATE <target-ip> <target-port> "" 0 10000 KEYS key1 key2 ... ```
Step 5: Announce New Owner
# Run on ALL nodes
for node in all_nodes; do
redis-cli -h $node CLUSTER SETSLOT <slot> NODE <target-node-id>
doneStep 6: Verify
redis-cli -c GET <key-in-moved-slot>Should redirect to the new node.
Preventing Resharding Failures
Pre-Resharding Checklist
- 1.Ensure cluster is healthy:
- 2.```bash
- 3.redis-cli -c CLUSTER INFO | grep cluster_state
- 4.
` - 5.Verify all nodes reachable:
- 6.```bash
- 7.redis-cli --cluster check <node>:6379
- 8.
` - 9.Check for large keys:
- 10.```bash
- 11.redis-cli --bigkeys
- 12.
` - 13.Ensure sufficient bandwidth between nodes.
Configuration Tuning
In redis.conf:
cluster-node-timeout 15000
migrate-timeout 30000Increase these for large datasets or slow networks.
Batch Size Control
For manual migrations, use smaller batches:
redis-cli -h <source> MIGRATE <target-ip> <target-port> "" 0 5000 KEYS key1Process one key at a time for very large keys.