Fix Redis Cluster Resharding Failures - Complete Recovery Guide

The Problem

Resharding a Redis cluster to rebalance slots or add new nodes fails midway. The cluster might show slots in a migrating/importing state, nodes report errors, or the reshard command times out. Incomplete resharding can leave the cluster in an inconsistent state with some keys unreachable.

Understanding Resharding Phases

Resharding involves these steps:

1.Target node sends CLUSTER SETSLOT <slot> IMPORTING <source-node-id>
2.Source node sends CLUSTER SETSLOT <slot> MIGRATING <target-node-id>
3.Keys are moved with MIGRATE command
4.Slots are updated across cluster

A failure can occur at any phase, leaving different symptoms.

Diagnosing Resharding Failure

Check Slot States

bash

redis-cli -c CLUSTER NODES

Look for slots marked with MIGRATING or IMPORTING:

bash

nodeid1 10.0.0.1:6379@16379 myself,master - 0 1609459200000 1 connected 0-5460 [5460->-nodeid2] [5461-<-nodeid3]

The bracket notation shows: - [slot->-target] - Slot migrating out - [slot-<-source] - Slot importing in

Check for Open Slots

bash

redis-cli -c CLUSTER SLOTS

Incomplete slots show migration state in the output.

Identify Stuck Migration

bash

# Check for migrating slots on each master
redis-cli -c CLUSTER NODES | grep -E '\[.*->.*\]'

Check Node Connection State

bash

redis-cli -c CLUSTER INFO | grep cluster_state

If cluster state is fail, resharding cannot proceed.

Common Resharding Errors

Error 1: Connection Timeout

bash

[ERR] Node 10.0.0.5:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.

Cause: Target node already has data or cluster config.

Solution: Reset the node:

bash

redis-cli -h <target-node> CLUSTER RESET SOFT
# Or for complete reset
redis-cli -h <target-node> CLUSTER RESET HARD

Error 2: Slot Already Assigned

bash

[ERR] Slot 5460 is already busy

Cause: Another resharding operation is in progress or failed mid-way.

Solution: Check which node owns the slot:

bash

redis-cli -c CLUSTER NODES | grep 5460

Resolve by completing or rolling back the migration.

Error 3: Key Migration Timeout

bash

[ERR] Timeout migrating keys from 10.0.0.1:6379 to 10.0.0.4:6379

Cause: Large keys or slow network.

Solution: Increase timeout and retry:

bash

redis-cli --cluster reshard <node>:6379 --cluster-timeout 60000

Error 4: Node Not Reachable

bash

[ERR] Node 10.0.0.5:6379 is not reachable

Cause: Network issue or node down.

Solution: Verify connectivity:

bash

redis-cli -h 10.0.0.5 -p 6379 PING

Recovery Procedures

Recover from Incomplete Migration

#### Step 1: Identify the Problem Slots

bash

redis-cli -c CLUSTER NODES | grep -E '\['

#### Step 2: Check Migration State on Source Node

bash

redis-cli -h <source-node> -p 6379 CLUSTER GETKEYSINSLOT <slot> 100

If this returns keys, they weren't migrated.

#### Step 3: Manually Complete Migration

```bash # Get keys in the slot redis-cli -h <source-node> -p 6379 CLUSTER GETKEYSINSLOT <slot> 100

# Migrate each key redis-cli -h <source-node> -p 6379 MIGRATE <target-ip> <target-port> "" 0 5000 KEYS key1 key2 key3 ```

#### Step 4: Finalize Slot Assignment

bash

# On each master node
redis-cli -h <node> -p 6379 CLUSTER SETSLOT <slot> NODE <target-node-id>

#### Step 5: Verify Slot State

bash

redis-cli -c CLUSTER SLOTS | grep -A2 <slot>

Abort and Roll Back Migration

If you want to abandon the migration:

#### Step 1: Clear Importing State (Target Node)

bash

redis-cli -h <target-node> -p 6379 CLUSTER SETSLOT <slot> STABLE

#### Step 2: Clear Migrating State (Source Node)

bash

redis-cli -h <source-node> -p 6379 CLUSTER SETSLOT <slot> STABLE

#### Step 3: Verify Stable State

bash

redis-cli -c CLUSTER NODES | grep -E '\['

Should return nothing if all migrations cleared.

Use Cluster Fix Command

For automatic recovery:

bash

redis-cli --cluster fix <any-node>:6379

This command: - Detects slots claimed by multiple nodes - Reassigns orphan slots - Clears inconsistent states

Add --cluster-search-multiple-owners for thorough checking:

bash

redis-cli --cluster fix <any-node>:6379 --cluster-search-multiple-owners

Manual Resharding Step by Step

When --cluster reshard fails repeatedly, manual resharding gives more control:

Step 1: Identify Slots to Move

bash

redis-cli -c CLUSTER NODES

Step 2: Set Slot Importing on Target

bash

redis-cli -h <target-node> CLUSTER SETSLOT <slot> IMPORTING <source-node-id>

Step 3: Set Slot Migrating on Source

bash

redis-cli -h <source-node> CLUSTER SETSLOT <slot> MIGRATING <target-node-id>

Step 4: Migrate Keys

```bash # Get count of keys in slot redis-cli -h <source-node> CLUSTER COUNTKEYSINSLOT <slot>

# Get keys redis-cli -h <source-node> CLUSTER GETKEYSINSLOT <slot> <count>

# Migrate in batches redis-cli -h <source-node> MIGRATE <target-ip> <target-port> "" 0 10000 KEYS key1 key2 ... ```

Step 5: Announce New Owner

bash

# Run on ALL nodes
for node in all_nodes; do
    redis-cli -h $node CLUSTER SETSLOT <slot> NODE <target-node-id>
done

Step 6: Verify

bash

redis-cli -c GET <key-in-moved-slot>

Should redirect to the new node.

Preventing Resharding Failures

Pre-Resharding Checklist

1.Ensure cluster is healthy:
2.```bash
3.redis-cli -c CLUSTER INFO | grep cluster_state
4.`
5.Verify all nodes reachable:
6.```bash
7.redis-cli --cluster check <node>:6379
8.`
9.Check for large keys:
10.```bash
11.redis-cli --bigkeys
12.`
13.Ensure sufficient bandwidth between nodes.

Configuration Tuning

In redis.conf:

bash

cluster-node-timeout 15000
migrate-timeout 30000

Increase these for large datasets or slow networks.

Batch Size Control

For manual migrations, use smaller batches:

bash

redis-cli -h <source> MIGRATE <target-ip> <target-port> "" 0 5000 KEYS key1

Process one key at a time for very large keys.

Redis Cluster Resharding Failed

The Problem

Understanding Resharding Phases

Diagnosing Resharding Failure

Check Slot States

Check for Open Slots

Identify Stuck Migration

Check Node Connection State

Common Resharding Errors

Error 1: Connection Timeout

Error 2: Slot Already Assigned

Error 3: Key Migration Timeout

Error 4: Node Not Reachable

Recovery Procedures

Recover from Incomplete Migration

Abort and Roll Back Migration

Use Cluster Fix Command

Manual Resharding Step by Step

Step 1: Identify Slots to Move

Step 2: Set Slot Importing on Target

Step 3: Set Slot Migrating on Source

Step 4: Migrate Keys

Step 5: Announce New Owner

Step 6: Verify

Preventing Resharding Failures

Pre-Resharding Checklist

Configuration Tuning

Batch Size Control

Share this guide

More Redis Troubleshooting Guides

Redis Persistence Disabled Warning

Redis Client Output Buffer Exceeded

Redis Slow Log Not Logging

Redis AOF Load Error

Redis Loading RDB Error

Redis TLS Handshake Failed