The Problem
Replicas that disconnect briefly can't reconnect and resume replication. They request partial synchronization but the master no longer has the required data in the replication backlog. This forces a full synchronization, which is expensive and can cause significant latency spikes.
The error in logs typically reads:
Master does not have a backlog offset 123456789or
Unable to partial resync with master: no backlogUnderstanding the Replication Backlog
The replication backlog is a circular buffer that stores recent write commands. When a replica disconnects and reconnects:
- 1.Replica sends its last known offset
- 2.Master checks if that offset is still in backlog
- 3.If yes: Partial sync - send only missing commands
- 4.If no: Full sync - send complete dataset
The backlog size determines how long a replica can be disconnected while still being able to partial sync.
Diagnosis Commands
Check Backlog Configuration
redis-cli INFO replication | grep -E "repl_backlog_active|repl_backlog_size|repl_backlog_first_byte_offset|repl_backlog_histlen"Output interpretation:
repl_backlog_active:1 # Backlog is enabled
repl_backlog_size:1048576 # Configured size (1MB)
repl_backlog_first_byte_offset:1000 # Oldest data in backlog
repl_backlog_histlen:1048576 # Current backlog lengthCheck Replica Sync Status
redis-cli INFO replicationLook for:
role:master
connected_slaves:2
slave0:ip=10.0.0.2,port=6379,state=wait_bgsave,offset=0,lag=0
slave1:ip=10.0.0.3,port=6379,state=online,offset=12345678,lag=1States to watch:
- wait_bgsave - Waiting for RDB snapshot
- send_bulk - Sending RDB to replica
- online - Normal replication
Check Current Master Offset
redis-cli INFO replication | grep master_repl_offsetCompare with replica's last known offset.
Calculate Write Rate
```bash # Check instantaneous write rate redis-cli INFO stats | grep instantaneous_ops_per_sec
# Check total operations redis-cli INFO stats | grep total_commands_processed
# Sample over time echo $(redis-cli INFO stats | grep total_commands_processed | cut -d: -f2) sleep 60 echo $(redis-cli INFO stats | grep total_commands_processed | cut -d: -f2) ```
Determining Appropriate Backlog Size
Formula
Backlog Size = (Master write rate in bytes/sec) × (Maximum replica downtime)For example: - Write rate: 10 MB/sec - Max downtime: 5 minutes - Required backlog: 10 MB × 300 sec = 3 GB
Estimate Write Rate
# Measure over 1 minute
START=$(redis-cli INFO replication | grep master_repl_offset | cut -d: -f2)
sleep 60
END=$(redis-cli INFO replication | grep master_repl_offset | cut -d: -f2)
echo "Bytes per second: $(( (END - START) / 60 ))"Check Current Backlog Usage
redis-cli INFO replication | grep repl_backlog_histlenIf repl_backlog_histlen equals repl_backlog_size, the backlog is full and wrapping.
Solutions
Solution 1: Increase Backlog Size
# Increase backlog size (requires restart for older Redis versions)
redis-cli CONFIG SET repl-backlog-size 256mbFor Redis 6.0+, this is dynamic. For older versions, set in redis.conf and restart.
Solution 2: Disable Backlog (Not Recommended)
For scenarios where replicas always do full sync:
redis-cli CONFIG SET repl-backlog-size 0This disables the backlog but prevents partial resync.
Solution 3: Increase Backlog TTL
If replicas disconnect for long periods:
# Keep backlog even without replicas (seconds)
redis-cli CONFIG SET repl-backlog-ttl 3600Default is 3600 seconds. Setting to 0 keeps backlog indefinitely.
Handling Current Backlog Exhaustion
Scenario: Replica Can't Partial Sync
If replicas are failing to partial sync:
#### Step 1: Check If Full Sync is in Progress
redis-cli INFO replication | grep -E "state|rdb_"#### Step 2: Monitor Full Sync Progress
```bash # On master, check RDB generation redis-cli INFO persistence | grep -E "rdb_last_bgsave_status|rdb_changes_since_last_save"
# On replica, check sync progress redis-cli INFO replication | grep -E "master_link_status|master_sync_in_progress" ```
#### Step 3: Speed Up Full Sync
If full sync is taking too long:
# Enable diskless replication (faster for slow disks)
redis-cli CONFIG SET repl-diskless-sync yes
redis-cli CONFIG SET repl-diskless-sync-delay 5#### Step 4: Restart Replica If Stuck
# On replica
redis-cli REPLICAOF NO ONE
# Wait a moment
redis-cli REPLICAOF <master-ip> <master-port>Scenario: Backlog Too Small for Write Rate
#### Step 1: Increase Backlog Dynamically (Redis 6.0+)
redis-cli CONFIG SET repl-backlog-size 1gb#### Step 2: Trigger Replica Resync
# On replica, force full sync
redis-cli REPLICAOF NO ONE
redis-cli REPLICAOF <master-ip> <master-port>Scenario: Multiple Replicas Overloading Master
If full syncs are overwhelming the master:
#### Step 1: Stagger Replica Reconnection
Don't reconnect all replicas simultaneously.
#### Step 2: Use Replication Cascade
Configure one replica as source for others:
```bash # On primary replica redis-cli REPLICAOF <master-ip> <master-port>
# On secondary replicas redis-cli REPLICAOF <primary-replica-ip> <primary-replica-port> ```
Configuration Best Practices
Recommended Settings
```conf # redis.conf
# Backlog size - set based on write rate and downtime tolerance repl-backlog-size 256mb
# Keep backlog even without replicas repl-backlog-ttl 0
# Diskless replication for faster sync repl-diskless-sync yes repl-diskless-sync-delay 5
# Timeout for sync repl-timeout 60 ```
Calculate Appropriate Size
Monitor your actual needs:
```bash #!/bin/bash # Calculate required backlog size # Run during peak traffic
SAMPLES=60 INTERVAL=1
echo "Collecting write rate samples..." SUM=0 for i in $(seq 1 $SAMPLES); do START=$(redis-cli INFO replication | grep master_repl_offset | cut -d: -f2) sleep $INTERVAL END=$(redis-cli INFO replication | grep master_repl_offset | cut -d: -f2) RATE=$((END - START)) SUM=$((SUM + RATE)) echo "Sample $i: $RATE bytes/sec" done
AVG=$((SUM / SAMPLES)) echo "Average write rate: $AVG bytes/sec" echo "Recommended backlog for 5min downtime: $((AVG * 300)) bytes" echo "Recommended backlog for 10min downtime: $((AVG * 600)) bytes" ```
Monitoring and Alerting
Key Metrics
```bash # Backlog usage redis-cli INFO replication | grep repl_backlog_histlen
# Replica lag redis-cli INFO replication | grep lag
# Sync status redis-cli INFO replication | grep master_link_status ```
Alerting Conditions
Alert when:
- 1.
repl_backlog_histlenequalsrepl_backlog_size(backlog full) - 2.Any replica
lagexceeds threshold - 3.
master_link_statusisdown
Verification
After making changes:
```bash # Verify backlog size redis-cli CONFIG GET repl-backlog-size
# Check backlog is being used redis-cli INFO replication | grep repl_backlog
# Test partial sync by briefly disconnecting a replica # On replica: redis-cli REPLICAOF NO ONE sleep 5 redis-cli REPLICAOF <master-ip> <master-port>
# Check if partial sync occurred redis-cli INFO replication | grep master_link_status ```