Introduction

The oplog (operations log) is the heartbeat of MongoDB replication. It stores all write operations in a capped collection that secondaries consume to maintain data consistency. When oplog errors occur, replication can completely halt, causing secondaries to fall irrecoverably behind and requiring emergency intervention to restore replica set integrity.

Symptoms

Oplog errors manifest with distinct patterns:

```text # Oplog exhaustion (secondary too far behind) Error: Oplog CursorMinKeyNotFound Replication halted: oplog no longer contains required entries MongoServerError: cannot sync, oplog too far behind

# Oplog corruption WiredTiger error reading from oplog BSONObjectTooLarge: oplog entry exceeds size limit

# Oplog query errors CursorNotFound: oplog cursor expired OperationFailed: oplog query failed

# In logs {"msg":"Replication halt","attr":{"reason":"Oplog Position Lost"}} {"msg":"Secondary cannot catch up","attr":{"minValid":{"$timestamp":{"t":100,"i":1}}}}

# In rs.status() "syncSourceHost": "", "lastHeartbeatMessage": "error RS102 too stale to catch up" "optimeDate": significantly behind primary ```

Common Causes

  1. 1.Oplog size too small - Retention window shorter than secondary downtime
  2. 2.Secondary extended downtime - Offline longer than oplog coverage
  3. 3.High write volume - Oplog fills faster than secondaries consume
  4. 4.Oplog corruption - WiredTiger corruption in oplog collection
  5. 5.Large transactions - Single oplog entry exceeds 16MB BSON limit
  6. 6.Network instability - Intermittent connectivity causing cursor expiration

Step-by-Step Fix

Step 1: Diagnose Oplog State

Check oplog status on primary:

```javascript mongosh --host primary:27017

// Check oplog size and window use local db.oplog.rs.stats()

// Calculate time window covered let first = db.oplog.rs.find().sort({ ts: 1 }).limit(1).next() let last = db.oplog.rs.find().sort({ ts: -1 }).limit(1).next() let hoursCovered = (last.ts.t - first.ts.t) / 3600 print("Oplog covers: " + hoursCovered.toFixed(2) + " hours")

// Check oplog size print("Size: " + (db.oplog.rs.stats().size / 1024 / 1024 / 1024).toFixed(2) + " GB") print("Max size: " + (db.oplog.rs.stats().maxSize / 1024 / 1024 / 1024).toFixed(2) + " GB") ```

Check secondary's required position:

```javascript // On secondary mongosh --host secondary:27017

use local // What oplog position secondary needs db.oplog.rs.find().sort({ ts: -1 }).limit(1).next()

// Check if required entries exist on primary // Compare secondary's lastApplied with primary's oldest entry ```

Step 2: Check Oplog Exhaustion

When secondary is "too stale":

```javascript // On primary - check oldest oplog entry use local let oldest = db.oplog.rs.find().sort({ ts: 1 }).limit(1).next() printjson(oldest.ts)

// On secondary - check needed position // Look for "lastApplied" or "minValid" in rs.status() rs.status().members.find(m => m.name.includes("secondary")) ```

If secondary needs entries older than primary's oplog:

text
Secondary needs: { ts: Timestamp(1234567890, 1) }
Primary's oldest: { ts: Timestamp(1234567900, 1) }
// Secondary is 10 seconds "behind" primary's oldest = stale

Step 3: Resize Oplog (Immediate Fix)

Increase oplog size to extend retention window:

```javascript // On primary (MongoDB 4.0+) db.adminCommand({ replSetResizeOplog: 1, size: 10240 }) // 10 GB

// Check result use local db.oplog.rs.stats()

// This takes effect immediately without restart ```

For older MongoDB versions:

```bash # Stop primary (careful - will trigger election) sudo systemctl stop mongod

# Edit mongod.conf sudo nano /etc/mongod.conf

# Add oplogSizeMB setting replication: oplogSizeMB: 10240

# Restart sudo systemctl start mongod ```

Step 4: Resync Stale Secondary

When resize doesn't help (secondary already too far behind):

```bash # On stale secondary sudo systemctl stop mongod

# Remove all data sudo rm -rf /var/lib/mongodb/*

# Restart - will perform initial sync sudo systemctl start mongod

# Monitor sync progress mongosh --eval "rs.status()" ```

Alternative: Clone from another secondary:

```bash # Stop secondary sudo systemctl stop mongod

# Copy data from healthy secondary sudo rsync -avz /var/lib/mongodb/ secondary2:/var/lib/mongodb-backup/ # Or use LVM snapshot, mongodump, etc.

# Restart with copied data sudo systemctl start mongod ```

Step 5: Handle Oplog Corruption

Check for corruption:

```bash # Validate oplog mongosh --host primary:27017 use local db.validateCollection("oplog.rs")

# Check for errors in output ```

If corruption found:

```bash # Stop primary sudo systemctl stop mongod

# Run repair mongod --repair --dbpath /var/lib/mongodb

# Or more targeted: extract oplog, recreate # This is complex - consider resyncing entire member instead ```

Step 6: Handle Large Transactions

Find oversized oplog entries:

```javascript use local db.oplog.rs.find({ $where: function() { return Object.bsonsize(this) > 16 * 1024 * 1024 } }).forEach(o => { print("Large entry at " + o.ts.t + " size: " + Object.bsonsize(o)) })

// Typically caused by large array operations or multi: true updates ```

Prevent future large entries:

```javascript // Avoid large multi-updates db.collection.updateMany({}, { $set: { field: "value" } }) // Split into smaller batches

// Use bulk operations with batches let bulk = db.collection.initializeOrderedBulkOp() let cursor = db.collection.find() cursor.forEach(doc => { bulk.find({ _id: doc._id }).updateOne({ $set: { field: "value" } }) if (bulk.n > 1000) { bulk.execute() bulk = db.collection.initializeOrderedBulkOp() } }) bulk.execute() ```

Step 7: Monitor Oplog Health

Set up ongoing monitoring:

```javascript // Script to check oplog health function checkOplogHealth() { let stats = db.oplog.rs.stats() let first = db.oplog.rs.find().sort({ ts: 1 }).limit(1).next() let last = db.oplog.rs.find().sort({ ts: -1 }).limit(1).next()

let hours = (last.ts.t - first.ts.t) / 3600 let usage = stats.size / stats.maxSize

return { hoursCovered: hours, sizeGB: stats.size / 1e9, maxSizeGB: stats.maxSize / 1e9, percentUsed: usage * 100 } }

checkOplogHealth() ```

Verification

Verify oplog functioning:

```javascript // 1. Oplog window sufficient (> 24 hours recommended) use local let hours = (db.oplog.rs.find().sort({ ts: -1 }).limit(1).next().ts.t - db.oplog.rs.find().sort({ ts: 1 }).limit(1).next().ts.t) / 3600 print("Oplog window: " + hours + " hours")

// 2. No corruption db.validateCollection("oplog.rs")

// 3. Secondaries catching up rs.status() // All secondaries should have recent optimeDate

// 4. Replication lag minimal rs.printSlaveReplicationInfo() // Should show lag < 10 seconds

// 5. No oversized entries db.oplog.rs.find({ $where: "Object.bsonsize(this) > 16777216" }).count() // Should be 0 ```

Common Pitfalls

  • Oplog size based on disk, not write rate - Size must match write volume and downtime tolerance
  • Not monitoring oplog window - Can silently shrink during traffic spikes
  • Resyncing during peak hours - Initial sync consumes resources heavily
  • Forgetting to resize after capacity planning - Default size may be insufficient
  • Using initial sync for all recoveries - Sometimes cloning is faster

Best Practices

  • Size oplog to cover at least 24-72 hours of operations
  • Monitor oplog window with alerts at < 8 hours remaining
  • Plan for write spikes when sizing oplog
  • Document recovery procedures for stale secondary scenarios
  • Test oplog resize procedure before needing it
  • Use point-in-time recovery for critical data protection
  • Schedule initial sync during low-traffic windows
  • MongoDB Replica Set Error
  • MongoDB Initial Sync Failed
  • MongoDB Chunk Migration Error
  • MongoDB WiredTiger Error