The Problem

Resharding a Redis cluster to rebalance slots or add new nodes fails midway. The cluster might show slots in a migrating/importing state, nodes report errors, or the reshard command times out. Incomplete resharding can leave the cluster in an inconsistent state with some keys unreachable.

Understanding Resharding Phases

Resharding involves these steps:

  1. 1.Target node sends CLUSTER SETSLOT <slot> IMPORTING <source-node-id>
  2. 2.Source node sends CLUSTER SETSLOT <slot> MIGRATING <target-node-id>
  3. 3.Keys are moved with MIGRATE command
  4. 4.Slots are updated across cluster

A failure can occur at any phase, leaving different symptoms.

Diagnosing Resharding Failure

Check Slot States

bash
redis-cli -c CLUSTER NODES

Look for slots marked with MIGRATING or IMPORTING:

bash
nodeid1 10.0.0.1:6379@16379 myself,master - 0 1609459200000 1 connected 0-5460 [5460->-nodeid2] [5461-<-nodeid3]

The bracket notation shows: - [slot->-target] - Slot migrating out - [slot-<-source] - Slot importing in

Check for Open Slots

bash
redis-cli -c CLUSTER SLOTS

Incomplete slots show migration state in the output.

Identify Stuck Migration

bash
# Check for migrating slots on each master
redis-cli -c CLUSTER NODES | grep -E '\[.*->.*\]'

Check Node Connection State

bash
redis-cli -c CLUSTER INFO | grep cluster_state

If cluster state is fail, resharding cannot proceed.

Common Resharding Errors

Error 1: Connection Timeout

bash
[ERR] Node 10.0.0.5:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.

Cause: Target node already has data or cluster config.

Solution: Reset the node:

bash
redis-cli -h <target-node> CLUSTER RESET SOFT
# Or for complete reset
redis-cli -h <target-node> CLUSTER RESET HARD

Error 2: Slot Already Assigned

bash
[ERR] Slot 5460 is already busy

Cause: Another resharding operation is in progress or failed mid-way.

Solution: Check which node owns the slot:

bash
redis-cli -c CLUSTER NODES | grep 5460

Resolve by completing or rolling back the migration.

Error 3: Key Migration Timeout

bash
[ERR] Timeout migrating keys from 10.0.0.1:6379 to 10.0.0.4:6379

Cause: Large keys or slow network.

Solution: Increase timeout and retry:

bash
redis-cli --cluster reshard <node>:6379 --cluster-timeout 60000

Error 4: Node Not Reachable

bash
[ERR] Node 10.0.0.5:6379 is not reachable

Cause: Network issue or node down.

Solution: Verify connectivity:

bash
redis-cli -h 10.0.0.5 -p 6379 PING

Recovery Procedures

Recover from Incomplete Migration

#### Step 1: Identify the Problem Slots

bash
redis-cli -c CLUSTER NODES | grep -E '\['

#### Step 2: Check Migration State on Source Node

bash
redis-cli -h <source-node> -p 6379 CLUSTER GETKEYSINSLOT <slot> 100

If this returns keys, they weren't migrated.

#### Step 3: Manually Complete Migration

```bash # Get keys in the slot redis-cli -h <source-node> -p 6379 CLUSTER GETKEYSINSLOT <slot> 100

# Migrate each key redis-cli -h <source-node> -p 6379 MIGRATE <target-ip> <target-port> "" 0 5000 KEYS key1 key2 key3 ```

#### Step 4: Finalize Slot Assignment

bash
# On each master node
redis-cli -h <node> -p 6379 CLUSTER SETSLOT <slot> NODE <target-node-id>

#### Step 5: Verify Slot State

bash
redis-cli -c CLUSTER SLOTS | grep -A2 <slot>

Abort and Roll Back Migration

If you want to abandon the migration:

#### Step 1: Clear Importing State (Target Node)

bash
redis-cli -h <target-node> -p 6379 CLUSTER SETSLOT <slot> STABLE

#### Step 2: Clear Migrating State (Source Node)

bash
redis-cli -h <source-node> -p 6379 CLUSTER SETSLOT <slot> STABLE

#### Step 3: Verify Stable State

bash
redis-cli -c CLUSTER NODES | grep -E '\['

Should return nothing if all migrations cleared.

Use Cluster Fix Command

For automatic recovery:

bash
redis-cli --cluster fix <any-node>:6379

This command: - Detects slots claimed by multiple nodes - Reassigns orphan slots - Clears inconsistent states

Add --cluster-search-multiple-owners for thorough checking:

bash
redis-cli --cluster fix <any-node>:6379 --cluster-search-multiple-owners

Manual Resharding Step by Step

When --cluster reshard fails repeatedly, manual resharding gives more control:

Step 1: Identify Slots to Move

bash
redis-cli -c CLUSTER NODES

Step 2: Set Slot Importing on Target

bash
redis-cli -h <target-node> CLUSTER SETSLOT <slot> IMPORTING <source-node-id>

Step 3: Set Slot Migrating on Source

bash
redis-cli -h <source-node> CLUSTER SETSLOT <slot> MIGRATING <target-node-id>

Step 4: Migrate Keys

```bash # Get count of keys in slot redis-cli -h <source-node> CLUSTER COUNTKEYSINSLOT <slot>

# Get keys redis-cli -h <source-node> CLUSTER GETKEYSINSLOT <slot> <count>

# Migrate in batches redis-cli -h <source-node> MIGRATE <target-ip> <target-port> "" 0 10000 KEYS key1 key2 ... ```

Step 5: Announce New Owner

bash
# Run on ALL nodes
for node in all_nodes; do
    redis-cli -h $node CLUSTER SETSLOT <slot> NODE <target-node-id>
done

Step 6: Verify

bash
redis-cli -c GET <key-in-moved-slot>

Should redirect to the new node.

Preventing Resharding Failures

Pre-Resharding Checklist

  1. 1.Ensure cluster is healthy:
  2. 2.```bash
  3. 3.redis-cli -c CLUSTER INFO | grep cluster_state
  4. 4.`
  5. 5.Verify all nodes reachable:
  6. 6.```bash
  7. 7.redis-cli --cluster check <node>:6379
  8. 8.`
  9. 9.Check for large keys:
  10. 10.```bash
  11. 11.redis-cli --bigkeys
  12. 12.`
  13. 13.Ensure sufficient bandwidth between nodes.

Configuration Tuning

In redis.conf:

bash
cluster-node-timeout 15000
migrate-timeout 30000

Increase these for large datasets or slow networks.

Batch Size Control

For manual migrations, use smaller batches:

bash
redis-cli -h <source> MIGRATE <target-ip> <target-port> "" 0 5000 KEYS key1

Process one key at a time for very large keys.