The Problem
Redis diskless replication transfers fail, leaving replicas unable to sync. Errors include:
Replica failed to sync with master: Connection timed out during RDB transfer
Diskless replication transfer failed
Master stream ended unexpectedlyDiskless replication skips disk I/O by transferring RDB directly over the network, but network issues can cause failures.
Understanding Diskless Replication
- 1.Traditional replication:
- 2.Master writes RDB to disk
- 3.Replica reads from disk
- 4.Slow disk becomes bottleneck
- 1.Diskless replication:
- 2.Master forks child process
- 3.Child writes RDB directly to replica sockets
- 4.No disk involvement
- 5.Faster for networks faster than disks
Configuration:
- repl-diskless-sync yes - Enable diskless
- repl-diskless-sync-delay 5 - Wait for replicas before starting
- repl-timeout 60 - Timeout for various replication operations
Diagnosis Commands
Check Diskless Configuration
redis-cli CONFIG GET repl-diskless-sync
redis-cli CONFIG GET repl-diskless-sync-delay
redis-cli CONFIG GET repl-diskless-sync-max-replicaCheck Replication Status
redis-cli INFO replicationKey fields:
role:master
connected_slaves:1
slave0:ip=10.0.0.2,port=6379,state=send_bulk,offset=0,lag=0
master_repl_offset:12345678
repl_backlog_active:1States during sync:
- wait_bgsave - Waiting for RDB generation
- send_bulk - Sending RDB data (diskless or file-based)
- online - Normal streaming replication
Check Transfer Progress
```bash # On master redis-cli INFO replication | grep -E "rdb_bgsave_in_progress|slave.*state"
# On replica redis-cli INFO replication | grep -E "master_sync_in_progress|master_sync_last_io_seconds" ```
Check for Timeout Errors
grep -i "diskless\|replica\|sync\|timeout" /var/log/redis/redis-server.log | tail -20Common Diskless Replication Errors
Error 1: Transfer Timeout
Replica timed out during diskless RDB transfer
Connection timed out during RDB transferDiagnosis:
```bash # Check timeout settings redis-cli CONFIG GET repl-timeout
# Check network latency ping replica_host
# Check dataset size redis-cli INFO memory | grep used_memory_human ```
Solutions:
- 1.Increase replication timeout:
```bash redis-cli CONFIG SET repl-timeout 120
# In redis.conf repl-timeout 120 ```
- 1.Reduce delay waiting for replicas:
redis-cli CONFIG SET repl-diskless-sync-delay 0- 1.Check network bandwidth:
# Transfer time = Dataset size / Bandwidth
# 10GB dataset over 100Mbps = ~14 minutes
# Ensure timeout covers thisError 2: Replica Disconnection During Transfer
Replica disconnected during RDB transfer
Master stream ended unexpectedlyDiagnosis:
```bash # On replica, check state redis-cli INFO replication | grep master_link_status
# Check for network issues redis-cli CLIENT LIST | grep replica ```
Causes:
- 1.Replica timeout too short
- 2.Network interruption
- 3.Replica crash
Solution:
```bash # Increase replica-side timeout too # On replica redis-cli CONFIG SET repl-timeout 120
# Reconnect replica redis-cli REPLICAOF NO ONE redis-cli REPLICAOF master_ip master_port ```
Error 3: No Replicas Ready for Transfer
No replicas ready for diskless transferDiagnosis:
```bash # Check replicas waiting redis-cli INFO replication | grep slave
# Check delay setting redis-cli CONFIG GET repl-diskless-sync-delay ```
Explanation:
Master waits repl-diskless-sync-delay seconds for replicas to connect before starting transfer. If no replicas arrive, master starts anyway and replicas miss the transfer.
Solution:
- 1.Ensure replicas are connected before trigger:
```bash # On replica, connect first redis-cli REPLICAOF master_ip master_port
# Then trigger BGSAVE on master (if needed) redis-cli BGSAVE ```
- 1.Or reduce delay if replicas always connect quickly:
redis-cli CONFIG SET repl-diskless-sync-delay 1Error 4: Multiple Replicas with Single Transfer
When multiple replicas need sync, diskless replication sends to first N replicas.
Diagnosis:
redis-cli CONFIG GET repl-diskless-sync-max-replicaDefault is 0 (unlimited, but performance degrades).
Solution:
Limit concurrent transfers:
```bash redis-cli CONFIG SET repl-diskless-sync-max-replica 4
# Remaining replicas will wait for next opportunity # Or use replication cascade: # master -> replica1 -> replica2 -> replica3 ```
Error 5: Large Dataset Timeout
Very large datasets exceed transfer timeout.
Diagnosis:
```bash # Estimate transfer time SIZE=$(redis-cli INFO memory | grep used_memory | cut -d: -f2) echo "Dataset: $SIZE bytes"
# Required timeout = Transfer time + buffer # For 1GB over 10Mbps = ~100 seconds minimum ```
Solution:
- 1.Use file-based replication for large datasets:
```bash redis-cli CONFIG SET repl-diskless-sync no redis-cli CONFIG SET repl-diskless-sync-delay 0
# RDB file can be read by multiple replicas # And replica can retry from file ```
- 1.Use replication backlog to avoid full sync:
```bash redis-cli CONFIG SET repl-backlog-size 256mb
# Replica can partial sync if offset in backlog ```
Error 6: Load-Replaced Replicas Fail
Redis 7.0+ feature repl-diskless-sync-load-replaced can cause issues.
Diagnosis:
redis-cli CONFIG GET repl-diskless-sync-load-replacedExplanation:
When enabled, replicas directly load incoming RDB into memory instead of saving to disk first. If transfer fails, replica loses data.
Solution:
Disable for stability:
redis-cli CONFIG SET repl-diskless-sync-load-replaced noOr ensure stable network before enabling.
Step-by-Step Recovery
Scenario: Replica Failed During Diskless Sync
#### Step 1: Check Current State
```bash # On master redis-cli INFO replication | grep slave
# On replica redis-cli INFO replication ```
#### Step 2: Disconnect and Reconnect Replica
```bash # On replica redis-cli REPLICAOF NO ONE
# Clear any partial data (optional, loses data) # redis-cli FLUSHALL
# Reconnect redis-cli REPLICAOF master_ip master_port ```
#### Step 3: Switch to File-Based if Diskless Fails
```bash # On master redis-cli CONFIG SET repl-diskless-sync no
# Trigger new sync redis-cli BGSAVE ```
#### Step 4: Monitor Transfer Progress
# Watch sync state
watch -n 1 'redis-cli INFO replication | grep -E "master_sync|slave.*state"'#### Step 5: Verify Completion
# On replica
redis-cli INFO replication | grep master_link_status
# Should show: master_link_status:upConfiguration Best Practices
For Fast Networks (Gigabit+)
```conf # redis.conf on master repl-diskless-sync yes repl-diskless-sync-delay 5 repl-diskless-sync-max-replica 10 repl-timeout 60
# On replica repl-timeout 60 ```
For Slow Networks or Large Datasets
```conf # redis.conf on master repl-diskless-sync no repl-timeout 300 repl-backlog-size 512mb
# On replica repl-timeout 300 ```
For Mixed Environments
# Use diskless for initial sync, backlog for reconnect
repl-diskless-sync yes
repl-diskless-sync-delay 5
repl-backlog-size 256mb
repl-backlog-ttl 3600Monitoring Diskless Transfers
Master-Side Monitoring
```bash #!/bin/bash # Monitor diskless replication on master
while true; do SLAVES=$(redis-cli INFO replication | grep slave | wc -l) SYNCING=$(redis-cli INFO replication | grep "state=send_bulk" | wc -l)
if [ "$SYNCING" -gt 0 ]; then echo "$(date): $SYNCING replicas syncing via diskless" redis-cli INFO replication | grep "state=send_bulk" fi
sleep 5 done ```
Replica-Side Monitoring
```bash #!/bin/bash # Monitor replica sync progress
while true; do IN_PROGRESS=$(redis-cli INFO replication | grep master_sync_in_progress | cut -d: -f2 | tr -d '\r')
if [ "$IN_PROGRESS" = "1" ]; then LAST_IO=$(redis-cli INFO replication | grep master_sync_last_io_seconds | cut -d: -f2 | tr -d '\r') echo "$(date): Sync in progress, last IO: $LAST_IO seconds ago"
if [ "$LAST_IO" -gt 30 ]; then echo "WARNING: Sync may be stalled" fi else LINK=$(redis-cli INFO replication | grep master_link_status | cut -d: -f2 | tr -d '\r') echo "$(date): Link status: $LINK" fi
sleep 5 done ```
Performance Comparison
Diskless vs File-Based:
```bash # Test transfer time # Dataset: 10GB
# Diskless (network speed matters) # 1Gbps network = ~80 seconds # 100Mbps network = ~800 seconds
# File-based (disk speed matters) # SSD write = ~20 seconds # HDD write = ~60 seconds # Plus network transfer time
# Choose based on your bottleneck: # Network slower than disk -> use file-based # Disk slower than network -> use diskless ```
Verification
After enabling diskless replication:
```bash # Trigger sync and monitor redis-cli BGSAVE
# Watch state transitions watch -n 1 'redis-cli INFO replication | grep -E "rdb_|slave.*state"'
# Expected sequence: # wait_bgsave -> send_bulk -> online
# Verify diskless is active (no temp RDB file during sync) ls -la /var/lib/redis/temp-*.rdb # Should show nothing during diskless transfer ```