The Problem

Redis diskless replication transfers fail, leaving replicas unable to sync. Errors include:

bash
Replica failed to sync with master: Connection timed out during RDB transfer
Diskless replication transfer failed
Master stream ended unexpectedly

Diskless replication skips disk I/O by transferring RDB directly over the network, but network issues can cause failures.

Understanding Diskless Replication

  1. 1.Traditional replication:
  2. 2.Master writes RDB to disk
  3. 3.Replica reads from disk
  4. 4.Slow disk becomes bottleneck
  1. 1.Diskless replication:
  2. 2.Master forks child process
  3. 3.Child writes RDB directly to replica sockets
  4. 4.No disk involvement
  5. 5.Faster for networks faster than disks

Configuration: - repl-diskless-sync yes - Enable diskless - repl-diskless-sync-delay 5 - Wait for replicas before starting - repl-timeout 60 - Timeout for various replication operations

Diagnosis Commands

Check Diskless Configuration

bash
redis-cli CONFIG GET repl-diskless-sync
redis-cli CONFIG GET repl-diskless-sync-delay
redis-cli CONFIG GET repl-diskless-sync-max-replica

Check Replication Status

bash
redis-cli INFO replication

Key fields:

bash
role:master
connected_slaves:1
slave0:ip=10.0.0.2,port=6379,state=send_bulk,offset=0,lag=0
master_repl_offset:12345678
repl_backlog_active:1

States during sync: - wait_bgsave - Waiting for RDB generation - send_bulk - Sending RDB data (diskless or file-based) - online - Normal streaming replication

Check Transfer Progress

```bash # On master redis-cli INFO replication | grep -E "rdb_bgsave_in_progress|slave.*state"

# On replica redis-cli INFO replication | grep -E "master_sync_in_progress|master_sync_last_io_seconds" ```

Check for Timeout Errors

bash
grep -i "diskless\|replica\|sync\|timeout" /var/log/redis/redis-server.log | tail -20

Common Diskless Replication Errors

Error 1: Transfer Timeout

bash
Replica timed out during diskless RDB transfer
Connection timed out during RDB transfer

Diagnosis:

```bash # Check timeout settings redis-cli CONFIG GET repl-timeout

# Check network latency ping replica_host

# Check dataset size redis-cli INFO memory | grep used_memory_human ```

Solutions:

  1. 1.Increase replication timeout:

```bash redis-cli CONFIG SET repl-timeout 120

# In redis.conf repl-timeout 120 ```

  1. 1.Reduce delay waiting for replicas:
bash
redis-cli CONFIG SET repl-diskless-sync-delay 0
  1. 1.Check network bandwidth:
bash
# Transfer time = Dataset size / Bandwidth
# 10GB dataset over 100Mbps = ~14 minutes
# Ensure timeout covers this

Error 2: Replica Disconnection During Transfer

bash
Replica disconnected during RDB transfer
Master stream ended unexpectedly

Diagnosis:

```bash # On replica, check state redis-cli INFO replication | grep master_link_status

# Check for network issues redis-cli CLIENT LIST | grep replica ```

Causes:

  1. 1.Replica timeout too short
  2. 2.Network interruption
  3. 3.Replica crash

Solution:

```bash # Increase replica-side timeout too # On replica redis-cli CONFIG SET repl-timeout 120

# Reconnect replica redis-cli REPLICAOF NO ONE redis-cli REPLICAOF master_ip master_port ```

Error 3: No Replicas Ready for Transfer

bash
No replicas ready for diskless transfer

Diagnosis:

```bash # Check replicas waiting redis-cli INFO replication | grep slave

# Check delay setting redis-cli CONFIG GET repl-diskless-sync-delay ```

Explanation:

Master waits repl-diskless-sync-delay seconds for replicas to connect before starting transfer. If no replicas arrive, master starts anyway and replicas miss the transfer.

Solution:

  1. 1.Ensure replicas are connected before trigger:

```bash # On replica, connect first redis-cli REPLICAOF master_ip master_port

# Then trigger BGSAVE on master (if needed) redis-cli BGSAVE ```

  1. 1.Or reduce delay if replicas always connect quickly:
bash
redis-cli CONFIG SET repl-diskless-sync-delay 1

Error 4: Multiple Replicas with Single Transfer

When multiple replicas need sync, diskless replication sends to first N replicas.

Diagnosis:

bash
redis-cli CONFIG GET repl-diskless-sync-max-replica

Default is 0 (unlimited, but performance degrades).

Solution:

Limit concurrent transfers:

```bash redis-cli CONFIG SET repl-diskless-sync-max-replica 4

# Remaining replicas will wait for next opportunity # Or use replication cascade: # master -> replica1 -> replica2 -> replica3 ```

Error 5: Large Dataset Timeout

Very large datasets exceed transfer timeout.

Diagnosis:

```bash # Estimate transfer time SIZE=$(redis-cli INFO memory | grep used_memory | cut -d: -f2) echo "Dataset: $SIZE bytes"

# Required timeout = Transfer time + buffer # For 1GB over 10Mbps = ~100 seconds minimum ```

Solution:

  1. 1.Use file-based replication for large datasets:

```bash redis-cli CONFIG SET repl-diskless-sync no redis-cli CONFIG SET repl-diskless-sync-delay 0

# RDB file can be read by multiple replicas # And replica can retry from file ```

  1. 1.Use replication backlog to avoid full sync:

```bash redis-cli CONFIG SET repl-backlog-size 256mb

# Replica can partial sync if offset in backlog ```

Error 6: Load-Replaced Replicas Fail

Redis 7.0+ feature repl-diskless-sync-load-replaced can cause issues.

Diagnosis:

bash
redis-cli CONFIG GET repl-diskless-sync-load-replaced

Explanation:

When enabled, replicas directly load incoming RDB into memory instead of saving to disk first. If transfer fails, replica loses data.

Solution:

Disable for stability:

bash
redis-cli CONFIG SET repl-diskless-sync-load-replaced no

Or ensure stable network before enabling.

Step-by-Step Recovery

Scenario: Replica Failed During Diskless Sync

#### Step 1: Check Current State

```bash # On master redis-cli INFO replication | grep slave

# On replica redis-cli INFO replication ```

#### Step 2: Disconnect and Reconnect Replica

```bash # On replica redis-cli REPLICAOF NO ONE

# Clear any partial data (optional, loses data) # redis-cli FLUSHALL

# Reconnect redis-cli REPLICAOF master_ip master_port ```

#### Step 3: Switch to File-Based if Diskless Fails

```bash # On master redis-cli CONFIG SET repl-diskless-sync no

# Trigger new sync redis-cli BGSAVE ```

#### Step 4: Monitor Transfer Progress

bash
# Watch sync state
watch -n 1 'redis-cli INFO replication | grep -E "master_sync|slave.*state"'

#### Step 5: Verify Completion

bash
# On replica
redis-cli INFO replication | grep master_link_status
# Should show: master_link_status:up

Configuration Best Practices

For Fast Networks (Gigabit+)

```conf # redis.conf on master repl-diskless-sync yes repl-diskless-sync-delay 5 repl-diskless-sync-max-replica 10 repl-timeout 60

# On replica repl-timeout 60 ```

For Slow Networks or Large Datasets

```conf # redis.conf on master repl-diskless-sync no repl-timeout 300 repl-backlog-size 512mb

# On replica repl-timeout 300 ```

For Mixed Environments

conf
# Use diskless for initial sync, backlog for reconnect
repl-diskless-sync yes
repl-diskless-sync-delay 5
repl-backlog-size 256mb
repl-backlog-ttl 3600

Monitoring Diskless Transfers

Master-Side Monitoring

```bash #!/bin/bash # Monitor diskless replication on master

while true; do SLAVES=$(redis-cli INFO replication | grep slave | wc -l) SYNCING=$(redis-cli INFO replication | grep "state=send_bulk" | wc -l)

if [ "$SYNCING" -gt 0 ]; then echo "$(date): $SYNCING replicas syncing via diskless" redis-cli INFO replication | grep "state=send_bulk" fi

sleep 5 done ```

Replica-Side Monitoring

```bash #!/bin/bash # Monitor replica sync progress

while true; do IN_PROGRESS=$(redis-cli INFO replication | grep master_sync_in_progress | cut -d: -f2 | tr -d '\r')

if [ "$IN_PROGRESS" = "1" ]; then LAST_IO=$(redis-cli INFO replication | grep master_sync_last_io_seconds | cut -d: -f2 | tr -d '\r') echo "$(date): Sync in progress, last IO: $LAST_IO seconds ago"

if [ "$LAST_IO" -gt 30 ]; then echo "WARNING: Sync may be stalled" fi else LINK=$(redis-cli INFO replication | grep master_link_status | cut -d: -f2 | tr -d '\r') echo "$(date): Link status: $LINK" fi

sleep 5 done ```

Performance Comparison

Diskless vs File-Based:

```bash # Test transfer time # Dataset: 10GB

# Diskless (network speed matters) # 1Gbps network = ~80 seconds # 100Mbps network = ~800 seconds

# File-based (disk speed matters) # SSD write = ~20 seconds # HDD write = ~60 seconds # Plus network transfer time

# Choose based on your bottleneck: # Network slower than disk -> use file-based # Disk slower than network -> use diskless ```

Verification

After enabling diskless replication:

```bash # Trigger sync and monitor redis-cli BGSAVE

# Watch state transitions watch -n 1 'redis-cli INFO replication | grep -E "rdb_|slave.*state"'

# Expected sequence: # wait_bgsave -> send_bulk -> online

# Verify diskless is active (no temp RDB file during sync) ls -la /var/lib/redis/temp-*.rdb # Should show nothing during diskless transfer ```