Introduction During Redis Cluster resharding, slots are migrated from source nodes to target nodes using the MIGRATE command. If the migration takes too long (default timeout is very conservative) or network issues interrupt the transfer, the slot can end up in an inconsistent state where some keys exist on both nodes or the migration hangs entirely.

Symptoms - `redis-cli --cluster reshard` hangs with "Moving slot" message for minutes - `CLUSTER NODES` shows slot in `migrating` or `importing` state indefinitely - Clients receive `MOVED` or `ASK` redirect errors that loop without resolution - `CLUSTER INFO` shows `cluster_state:ok` but `cluster_slots_assigned` is less than 16384 - Application logs show `ERR ASK` responses during migration

Common Causes - Large keys (big hashes, sorted sets with millions of members) taking too long to migrate - Network latency between source and target nodes exceeding MIGRATE timeout - High throughput on the source node during migration competing with migration bandwidth - Insufficient MIGRATE timeout for the data volume being transferred - Target node running out of memory during import

Step-by-Step Fix 1. **Check the current cluster state and stuck migrations**: ```bash redis-cli CLUSTER NODES redis-cli CLUSTER INFO # Look for migrating/importing states redis-cli CLUSTER SLOTS ```

  1. 1.Identify the stuck slot and affected keys:
  2. 2.```bash
  3. 3.# Connect to the source node
  4. 4.redis-cli -p 7000
  5. 5.127.0.0.1:7000> CLUSTER GETKEYSINSLOT <slot_number> 100
  6. 6.`
  7. 7.Manually complete the migration for the stuck slot:
  8. 8.```bash
  9. 9.# On the SOURCE node, set the slot as migrating
  10. 10.redis-cli -p 7000 CLUSTER SETSLOT <slot_number> MIGRATING <target_node_id>

# Migrate remaining keys with explicit timeout redis-cli -p 7000 --cluster call 127.0.0.1:7000 CLUSTER GETKEYSINSLOT <slot> 1000 | \ xargs -I {} redis-cli -p 7000 MIGRATE <target_ip> <target_port> "" 0 60000 KEYS {}

# On the TARGET node, import the slot redis-cli -p 7001 CLUSTER SETSLOT <slot_number> IMPORTING <source_node_id>

# After all keys are migrated, finalize on ALL nodes redis-cli -p 7000 CLUSTER SETSLOT <slot_number> NODE <target_node_id> redis-cli -p 7001 CLUSTER SETSLOT <slot_number> NODE <target_node_id> redis-cli -p 7002 CLUSTER SETSLOT <slot_number> NODE <target_node_id> ```

  1. 1.Clean up any orphaned ASK state:
  2. 2.```bash
  3. 3.redis-cli -p 7000 CLUSTER SETSLOT <slot_number> STABLE
  4. 4.redis-cli -p 7001 CLUSTER SETSLOT <slot_number> STABLE
  5. 5.`
  6. 6.Retry the reshard with increased timeout:
  7. 7.```bash
  8. 8.redis-cli --cluster reshard 127.0.0.1:7000 \
  9. 9.--cluster-from <source_node_id> \
  10. 10.--cluster-to <target_node_id> \
  11. 11.--cluster-slots 500 \
  12. 12.--cluster-yes \
  13. 13.--cluster-timeout 60000 \
  14. 14.--cluster-pipeline 10
  15. 15.`
  16. 16.For very large keys, migrate them individually before resharding:
  17. 17.```bash
  18. 18.# Find big keys on the source
  19. 19.redis-cli -p 7000 --bigkeys

# Migrate large keys one by one with longer timeout redis-cli -p 7000 MIGRATE <target_ip> <target_port> "my:large:hash" 0 300000 ```

Prevention - Run resharding during low-traffic periods - Pre-identify and migrate big keys before starting slot migration - Use `--cluster-timeout 60000` (60s) instead of the default for production clusters - Monitor target node memory before migration to ensure capacity - Use Redis 7.0+ which has improved migration with batched MIGRATE - Set up monitoring on `cluster_slots_assigned` and alert if not 16384 - Test resharding procedures in staging with production-sized datasets