Introduction
The oplog (operations log) is the heartbeat of MongoDB replication. It stores all write operations in a capped collection that secondaries consume to maintain data consistency. When oplog errors occur, replication can completely halt, causing secondaries to fall irrecoverably behind and requiring emergency intervention to restore replica set integrity.
Symptoms
Oplog errors manifest with distinct patterns:
```text # Oplog exhaustion (secondary too far behind) Error: Oplog CursorMinKeyNotFound Replication halted: oplog no longer contains required entries MongoServerError: cannot sync, oplog too far behind
# Oplog corruption WiredTiger error reading from oplog BSONObjectTooLarge: oplog entry exceeds size limit
# Oplog query errors CursorNotFound: oplog cursor expired OperationFailed: oplog query failed
# In logs {"msg":"Replication halt","attr":{"reason":"Oplog Position Lost"}} {"msg":"Secondary cannot catch up","attr":{"minValid":{"$timestamp":{"t":100,"i":1}}}}
# In rs.status() "syncSourceHost": "", "lastHeartbeatMessage": "error RS102 too stale to catch up" "optimeDate": significantly behind primary ```
Common Causes
- 1.Oplog size too small - Retention window shorter than secondary downtime
- 2.Secondary extended downtime - Offline longer than oplog coverage
- 3.High write volume - Oplog fills faster than secondaries consume
- 4.Oplog corruption - WiredTiger corruption in oplog collection
- 5.Large transactions - Single oplog entry exceeds 16MB BSON limit
- 6.Network instability - Intermittent connectivity causing cursor expiration
Step-by-Step Fix
Step 1: Diagnose Oplog State
Check oplog status on primary:
```javascript mongosh --host primary:27017
// Check oplog size and window use local db.oplog.rs.stats()
// Calculate time window covered let first = db.oplog.rs.find().sort({ ts: 1 }).limit(1).next() let last = db.oplog.rs.find().sort({ ts: -1 }).limit(1).next() let hoursCovered = (last.ts.t - first.ts.t) / 3600 print("Oplog covers: " + hoursCovered.toFixed(2) + " hours")
// Check oplog size print("Size: " + (db.oplog.rs.stats().size / 1024 / 1024 / 1024).toFixed(2) + " GB") print("Max size: " + (db.oplog.rs.stats().maxSize / 1024 / 1024 / 1024).toFixed(2) + " GB") ```
Check secondary's required position:
```javascript // On secondary mongosh --host secondary:27017
use local // What oplog position secondary needs db.oplog.rs.find().sort({ ts: -1 }).limit(1).next()
// Check if required entries exist on primary // Compare secondary's lastApplied with primary's oldest entry ```
Step 2: Check Oplog Exhaustion
When secondary is "too stale":
```javascript // On primary - check oldest oplog entry use local let oldest = db.oplog.rs.find().sort({ ts: 1 }).limit(1).next() printjson(oldest.ts)
// On secondary - check needed position // Look for "lastApplied" or "minValid" in rs.status() rs.status().members.find(m => m.name.includes("secondary")) ```
If secondary needs entries older than primary's oplog:
Secondary needs: { ts: Timestamp(1234567890, 1) }
Primary's oldest: { ts: Timestamp(1234567900, 1) }
// Secondary is 10 seconds "behind" primary's oldest = staleStep 3: Resize Oplog (Immediate Fix)
Increase oplog size to extend retention window:
```javascript // On primary (MongoDB 4.0+) db.adminCommand({ replSetResizeOplog: 1, size: 10240 }) // 10 GB
// Check result use local db.oplog.rs.stats()
// This takes effect immediately without restart ```
For older MongoDB versions:
```bash # Stop primary (careful - will trigger election) sudo systemctl stop mongod
# Edit mongod.conf sudo nano /etc/mongod.conf
# Add oplogSizeMB setting replication: oplogSizeMB: 10240
# Restart sudo systemctl start mongod ```
Step 4: Resync Stale Secondary
When resize doesn't help (secondary already too far behind):
```bash # On stale secondary sudo systemctl stop mongod
# Remove all data sudo rm -rf /var/lib/mongodb/*
# Restart - will perform initial sync sudo systemctl start mongod
# Monitor sync progress mongosh --eval "rs.status()" ```
Alternative: Clone from another secondary:
```bash # Stop secondary sudo systemctl stop mongod
# Copy data from healthy secondary sudo rsync -avz /var/lib/mongodb/ secondary2:/var/lib/mongodb-backup/ # Or use LVM snapshot, mongodump, etc.
# Restart with copied data sudo systemctl start mongod ```
Step 5: Handle Oplog Corruption
Check for corruption:
```bash # Validate oplog mongosh --host primary:27017 use local db.validateCollection("oplog.rs")
# Check for errors in output ```
If corruption found:
```bash # Stop primary sudo systemctl stop mongod
# Run repair mongod --repair --dbpath /var/lib/mongodb
# Or more targeted: extract oplog, recreate # This is complex - consider resyncing entire member instead ```
Step 6: Handle Large Transactions
Find oversized oplog entries:
```javascript use local db.oplog.rs.find({ $where: function() { return Object.bsonsize(this) > 16 * 1024 * 1024 } }).forEach(o => { print("Large entry at " + o.ts.t + " size: " + Object.bsonsize(o)) })
// Typically caused by large array operations or multi: true updates ```
Prevent future large entries:
```javascript // Avoid large multi-updates db.collection.updateMany({}, { $set: { field: "value" } }) // Split into smaller batches
// Use bulk operations with batches let bulk = db.collection.initializeOrderedBulkOp() let cursor = db.collection.find() cursor.forEach(doc => { bulk.find({ _id: doc._id }).updateOne({ $set: { field: "value" } }) if (bulk.n > 1000) { bulk.execute() bulk = db.collection.initializeOrderedBulkOp() } }) bulk.execute() ```
Step 7: Monitor Oplog Health
Set up ongoing monitoring:
```javascript // Script to check oplog health function checkOplogHealth() { let stats = db.oplog.rs.stats() let first = db.oplog.rs.find().sort({ ts: 1 }).limit(1).next() let last = db.oplog.rs.find().sort({ ts: -1 }).limit(1).next()
let hours = (last.ts.t - first.ts.t) / 3600 let usage = stats.size / stats.maxSize
return { hoursCovered: hours, sizeGB: stats.size / 1e9, maxSizeGB: stats.maxSize / 1e9, percentUsed: usage * 100 } }
checkOplogHealth() ```
Verification
Verify oplog functioning:
```javascript // 1. Oplog window sufficient (> 24 hours recommended) use local let hours = (db.oplog.rs.find().sort({ ts: -1 }).limit(1).next().ts.t - db.oplog.rs.find().sort({ ts: 1 }).limit(1).next().ts.t) / 3600 print("Oplog window: " + hours + " hours")
// 2. No corruption db.validateCollection("oplog.rs")
// 3. Secondaries catching up rs.status() // All secondaries should have recent optimeDate
// 4. Replication lag minimal rs.printSlaveReplicationInfo() // Should show lag < 10 seconds
// 5. No oversized entries db.oplog.rs.find({ $where: "Object.bsonsize(this) > 16777216" }).count() // Should be 0 ```
Common Pitfalls
- Oplog size based on disk, not write rate - Size must match write volume and downtime tolerance
- Not monitoring oplog window - Can silently shrink during traffic spikes
- Resyncing during peak hours - Initial sync consumes resources heavily
- Forgetting to resize after capacity planning - Default size may be insufficient
- Using initial sync for all recoveries - Sometimes cloning is faster
Best Practices
- Size oplog to cover at least 24-72 hours of operations
- Monitor oplog window with alerts at < 8 hours remaining
- Plan for write spikes when sizing oplog
- Document recovery procedures for stale secondary scenarios
- Test oplog resize procedure before needing it
- Use point-in-time recovery for critical data protection
- Schedule initial sync during low-traffic windows
Related Issues
- MongoDB Replica Set Error
- MongoDB Initial Sync Failed
- MongoDB Chunk Migration Error
- MongoDB WiredTiger Error