The Problem

Replicas that disconnect briefly can't reconnect and resume replication. They request partial synchronization but the master no longer has the required data in the replication backlog. This forces a full synchronization, which is expensive and can cause significant latency spikes.

The error in logs typically reads:

bash
Master does not have a backlog offset 123456789

or

bash
Unable to partial resync with master: no backlog

Understanding the Replication Backlog

The replication backlog is a circular buffer that stores recent write commands. When a replica disconnects and reconnects:

  1. 1.Replica sends its last known offset
  2. 2.Master checks if that offset is still in backlog
  3. 3.If yes: Partial sync - send only missing commands
  4. 4.If no: Full sync - send complete dataset

The backlog size determines how long a replica can be disconnected while still being able to partial sync.

Diagnosis Commands

Check Backlog Configuration

bash
redis-cli INFO replication | grep -E "repl_backlog_active|repl_backlog_size|repl_backlog_first_byte_offset|repl_backlog_histlen"

Output interpretation:

bash
repl_backlog_active:1                    # Backlog is enabled
repl_backlog_size:1048576               # Configured size (1MB)
repl_backlog_first_byte_offset:1000     # Oldest data in backlog
repl_backlog_histlen:1048576            # Current backlog length

Check Replica Sync Status

bash
redis-cli INFO replication

Look for:

bash
role:master
connected_slaves:2
slave0:ip=10.0.0.2,port=6379,state=wait_bgsave,offset=0,lag=0
slave1:ip=10.0.0.3,port=6379,state=online,offset=12345678,lag=1

States to watch: - wait_bgsave - Waiting for RDB snapshot - send_bulk - Sending RDB to replica - online - Normal replication

Check Current Master Offset

bash
redis-cli INFO replication | grep master_repl_offset

Compare with replica's last known offset.

Calculate Write Rate

```bash # Check instantaneous write rate redis-cli INFO stats | grep instantaneous_ops_per_sec

# Check total operations redis-cli INFO stats | grep total_commands_processed

# Sample over time echo $(redis-cli INFO stats | grep total_commands_processed | cut -d: -f2) sleep 60 echo $(redis-cli INFO stats | grep total_commands_processed | cut -d: -f2) ```

Determining Appropriate Backlog Size

Formula

bash
Backlog Size = (Master write rate in bytes/sec) × (Maximum replica downtime)

For example: - Write rate: 10 MB/sec - Max downtime: 5 minutes - Required backlog: 10 MB × 300 sec = 3 GB

Estimate Write Rate

bash
# Measure over 1 minute
START=$(redis-cli INFO replication | grep master_repl_offset | cut -d: -f2)
sleep 60
END=$(redis-cli INFO replication | grep master_repl_offset | cut -d: -f2)
echo "Bytes per second: $(( (END - START) / 60 ))"

Check Current Backlog Usage

bash
redis-cli INFO replication | grep repl_backlog_histlen

If repl_backlog_histlen equals repl_backlog_size, the backlog is full and wrapping.

Solutions

Solution 1: Increase Backlog Size

bash
# Increase backlog size (requires restart for older Redis versions)
redis-cli CONFIG SET repl-backlog-size 256mb

For Redis 6.0+, this is dynamic. For older versions, set in redis.conf and restart.

Solution 2: Disable Backlog (Not Recommended)

For scenarios where replicas always do full sync:

bash
redis-cli CONFIG SET repl-backlog-size 0

This disables the backlog but prevents partial resync.

Solution 3: Increase Backlog TTL

If replicas disconnect for long periods:

bash
# Keep backlog even without replicas (seconds)
redis-cli CONFIG SET repl-backlog-ttl 3600

Default is 3600 seconds. Setting to 0 keeps backlog indefinitely.

Handling Current Backlog Exhaustion

Scenario: Replica Can't Partial Sync

If replicas are failing to partial sync:

#### Step 1: Check If Full Sync is in Progress

bash
redis-cli INFO replication | grep -E "state|rdb_"

#### Step 2: Monitor Full Sync Progress

```bash # On master, check RDB generation redis-cli INFO persistence | grep -E "rdb_last_bgsave_status|rdb_changes_since_last_save"

# On replica, check sync progress redis-cli INFO replication | grep -E "master_link_status|master_sync_in_progress" ```

#### Step 3: Speed Up Full Sync

If full sync is taking too long:

bash
# Enable diskless replication (faster for slow disks)
redis-cli CONFIG SET repl-diskless-sync yes
redis-cli CONFIG SET repl-diskless-sync-delay 5

#### Step 4: Restart Replica If Stuck

bash
# On replica
redis-cli REPLICAOF NO ONE
# Wait a moment
redis-cli REPLICAOF <master-ip> <master-port>

Scenario: Backlog Too Small for Write Rate

#### Step 1: Increase Backlog Dynamically (Redis 6.0+)

bash
redis-cli CONFIG SET repl-backlog-size 1gb

#### Step 2: Trigger Replica Resync

bash
# On replica, force full sync
redis-cli REPLICAOF NO ONE
redis-cli REPLICAOF <master-ip> <master-port>

Scenario: Multiple Replicas Overloading Master

If full syncs are overwhelming the master:

#### Step 1: Stagger Replica Reconnection

Don't reconnect all replicas simultaneously.

#### Step 2: Use Replication Cascade

Configure one replica as source for others:

```bash # On primary replica redis-cli REPLICAOF <master-ip> <master-port>

# On secondary replicas redis-cli REPLICAOF <primary-replica-ip> <primary-replica-port> ```

Configuration Best Practices

Recommended Settings

```conf # redis.conf

# Backlog size - set based on write rate and downtime tolerance repl-backlog-size 256mb

# Keep backlog even without replicas repl-backlog-ttl 0

# Diskless replication for faster sync repl-diskless-sync yes repl-diskless-sync-delay 5

# Timeout for sync repl-timeout 60 ```

Calculate Appropriate Size

Monitor your actual needs:

```bash #!/bin/bash # Calculate required backlog size # Run during peak traffic

SAMPLES=60 INTERVAL=1

echo "Collecting write rate samples..." SUM=0 for i in $(seq 1 $SAMPLES); do START=$(redis-cli INFO replication | grep master_repl_offset | cut -d: -f2) sleep $INTERVAL END=$(redis-cli INFO replication | grep master_repl_offset | cut -d: -f2) RATE=$((END - START)) SUM=$((SUM + RATE)) echo "Sample $i: $RATE bytes/sec" done

AVG=$((SUM / SAMPLES)) echo "Average write rate: $AVG bytes/sec" echo "Recommended backlog for 5min downtime: $((AVG * 300)) bytes" echo "Recommended backlog for 10min downtime: $((AVG * 600)) bytes" ```

Monitoring and Alerting

Key Metrics

```bash # Backlog usage redis-cli INFO replication | grep repl_backlog_histlen

# Replica lag redis-cli INFO replication | grep lag

# Sync status redis-cli INFO replication | grep master_link_status ```

Alerting Conditions

Alert when:

  1. 1.repl_backlog_histlen equals repl_backlog_size (backlog full)
  2. 2.Any replica lag exceeds threshold
  3. 3.master_link_status is down

Verification

After making changes:

```bash # Verify backlog size redis-cli CONFIG GET repl-backlog-size

# Check backlog is being used redis-cli INFO replication | grep repl_backlog

# Test partial sync by briefly disconnecting a replica # On replica: redis-cli REPLICAOF NO ONE sleep 5 redis-cli REPLICAOF <master-ip> <master-port>

# Check if partial sync occurred redis-cli INFO replication | grep master_link_status ```