Fix Redis Replication Backlog Exceeded - Sync Recovery Guide

The Problem

Replicas that disconnect briefly can't reconnect and resume replication. They request partial synchronization but the master no longer has the required data in the replication backlog. This forces a full synchronization, which is expensive and can cause significant latency spikes.

The error in logs typically reads:

bash

Master does not have a backlog offset 123456789

bash

Unable to partial resync with master: no backlog

Understanding the Replication Backlog

The replication backlog is a circular buffer that stores recent write commands. When a replica disconnects and reconnects:

1.Replica sends its last known offset
2.Master checks if that offset is still in backlog
3.If yes: Partial sync - send only missing commands
4.If no: Full sync - send complete dataset

The backlog size determines how long a replica can be disconnected while still being able to partial sync.

Diagnosis Commands

Check Backlog Configuration

bash

redis-cli INFO replication | grep -E "repl_backlog_active|repl_backlog_size|repl_backlog_first_byte_offset|repl_backlog_histlen"

Output interpretation:

bash

repl_backlog_active:1                    # Backlog is enabled
repl_backlog_size:1048576               # Configured size (1MB)
repl_backlog_first_byte_offset:1000     # Oldest data in backlog
repl_backlog_histlen:1048576            # Current backlog length

Check Replica Sync Status

bash

redis-cli INFO replication

Look for:

bash

role:master
connected_slaves:2
slave0:ip=10.0.0.2,port=6379,state=wait_bgsave,offset=0,lag=0
slave1:ip=10.0.0.3,port=6379,state=online,offset=12345678,lag=1

States to watch: - wait_bgsave - Waiting for RDB snapshot - send_bulk - Sending RDB to replica - online - Normal replication

Check Current Master Offset

bash

redis-cli INFO replication | grep master_repl_offset

Compare with replica's last known offset.

Calculate Write Rate

```bash # Check instantaneous write rate redis-cli INFO stats | grep instantaneous_ops_per_sec

# Check total operations redis-cli INFO stats | grep total_commands_processed

# Sample over time echo $(redis-cli INFO stats | grep total_commands_processed | cut -d: -f2) sleep 60 echo $(redis-cli INFO stats | grep total_commands_processed | cut -d: -f2) ```

Determining Appropriate Backlog Size

Formula

bash

Backlog Size = (Master write rate in bytes/sec) × (Maximum replica downtime)

For example: - Write rate: 10 MB/sec - Max downtime: 5 minutes - Required backlog: 10 MB × 300 sec = 3 GB

Estimate Write Rate

bash

# Measure over 1 minute
START=$(redis-cli INFO replication | grep master_repl_offset | cut -d: -f2)
sleep 60
END=$(redis-cli INFO replication | grep master_repl_offset | cut -d: -f2)
echo "Bytes per second: $(( (END - START) / 60 ))"

Check Current Backlog Usage

bash

redis-cli INFO replication | grep repl_backlog_histlen

If repl_backlog_histlen equals repl_backlog_size, the backlog is full and wrapping.

Solutions

Solution 1: Increase Backlog Size

bash

# Increase backlog size (requires restart for older Redis versions)
redis-cli CONFIG SET repl-backlog-size 256mb

For Redis 6.0+, this is dynamic. For older versions, set in redis.conf and restart.

Solution 2: Disable Backlog (Not Recommended)

For scenarios where replicas always do full sync:

bash

redis-cli CONFIG SET repl-backlog-size 0

This disables the backlog but prevents partial resync.

Solution 3: Increase Backlog TTL

If replicas disconnect for long periods:

bash

# Keep backlog even without replicas (seconds)
redis-cli CONFIG SET repl-backlog-ttl 3600

Default is 3600 seconds. Setting to 0 keeps backlog indefinitely.

Handling Current Backlog Exhaustion

Scenario: Replica Can't Partial Sync

If replicas are failing to partial sync:

#### Step 1: Check If Full Sync is in Progress

bash

redis-cli INFO replication | grep -E "state|rdb_"

#### Step 2: Monitor Full Sync Progress

```bash # On master, check RDB generation redis-cli INFO persistence | grep -E "rdb_last_bgsave_status|rdb_changes_since_last_save"

# On replica, check sync progress redis-cli INFO replication | grep -E "master_link_status|master_sync_in_progress" ```

#### Step 3: Speed Up Full Sync

If full sync is taking too long:

bash

# Enable diskless replication (faster for slow disks)
redis-cli CONFIG SET repl-diskless-sync yes
redis-cli CONFIG SET repl-diskless-sync-delay 5

#### Step 4: Restart Replica If Stuck

bash

# On replica
redis-cli REPLICAOF NO ONE
# Wait a moment
redis-cli REPLICAOF <master-ip> <master-port>

Scenario: Backlog Too Small for Write Rate

#### Step 1: Increase Backlog Dynamically (Redis 6.0+)

bash

redis-cli CONFIG SET repl-backlog-size 1gb

#### Step 2: Trigger Replica Resync

bash

# On replica, force full sync
redis-cli REPLICAOF NO ONE
redis-cli REPLICAOF <master-ip> <master-port>

Scenario: Multiple Replicas Overloading Master

If full syncs are overwhelming the master:

#### Step 1: Stagger Replica Reconnection

Don't reconnect all replicas simultaneously.

#### Step 2: Use Replication Cascade

Configure one replica as source for others:

```bash # On primary replica redis-cli REPLICAOF <master-ip> <master-port>

# On secondary replicas redis-cli REPLICAOF <primary-replica-ip> <primary-replica-port> ```

Configuration Best Practices

Recommended Settings

```conf # redis.conf

# Backlog size - set based on write rate and downtime tolerance repl-backlog-size 256mb

# Keep backlog even without replicas repl-backlog-ttl 0

# Diskless replication for faster sync repl-diskless-sync yes repl-diskless-sync-delay 5

# Timeout for sync repl-timeout 60 ```

Calculate Appropriate Size

Monitor your actual needs:

```bash #!/bin/bash # Calculate required backlog size # Run during peak traffic

SAMPLES=60 INTERVAL=1

echo "Collecting write rate samples..." SUM=0 for i in $(seq 1 $SAMPLES); do START=$(redis-cli INFO replication | grep master_repl_offset | cut -d: -f2) sleep $INTERVAL END=$(redis-cli INFO replication | grep master_repl_offset | cut -d: -f2) RATE=$((END - START)) SUM=$((SUM + RATE)) echo "Sample $i: $RATE bytes/sec" done

AVG=$((SUM / SAMPLES)) echo "Average write rate: $AVG bytes/sec" echo "Recommended backlog for 5min downtime: $((AVG * 300)) bytes" echo "Recommended backlog for 10min downtime: $((AVG * 600)) bytes" ```

Monitoring and Alerting

Key Metrics

```bash # Backlog usage redis-cli INFO replication | grep repl_backlog_histlen

# Replica lag redis-cli INFO replication | grep lag

# Sync status redis-cli INFO replication | grep master_link_status ```

Alerting Conditions

Alert when:

1.repl_backlog_histlen equals repl_backlog_size (backlog full)
2.Any replica lag exceeds threshold
3.master_link_status is down

Verification

After making changes:

```bash # Verify backlog size redis-cli CONFIG GET repl-backlog-size

# Check backlog is being used redis-cli INFO replication | grep repl_backlog

# Test partial sync by briefly disconnecting a replica # On replica: redis-cli REPLICAOF NO ONE sleep 5 redis-cli REPLICAOF <master-ip> <master-port>

# Check if partial sync occurred redis-cli INFO replication | grep master_link_status ```

Redis Replication Backlog Exceeded

The Problem

Understanding the Replication Backlog

Diagnosis Commands

Check Backlog Configuration

Check Replica Sync Status

Check Current Master Offset

Calculate Write Rate

Determining Appropriate Backlog Size

Formula

Estimate Write Rate

Check Current Backlog Usage

Solutions

Solution 1: Increase Backlog Size

Solution 2: Disable Backlog (Not Recommended)

Solution 3: Increase Backlog TTL

Handling Current Backlog Exhaustion

Scenario: Replica Can't Partial Sync

Scenario: Backlog Too Small for Write Rate

Scenario: Multiple Replicas Overloading Master

Configuration Best Practices

Recommended Settings

Calculate Appropriate Size

Monitoring and Alerting

Key Metrics

Alerting Conditions

Verification

Share this guide

More Redis Troubleshooting Guides

Redis Persistence Disabled Warning

Redis Client Output Buffer Exceeded

Redis Slow Log Not Logging

Redis AOF Load Error

Redis Loading RDB Error

Redis TLS Handshake Failed