Introduction

Replica sets are MongoDB's primary mechanism for high availability and data redundancy. When replica set errors occur, they can manifest as election failures, sync stalls, or split-brain scenarios during network partitions. Understanding the replica set protocol and member states is essential for diagnosing and resolving these complex distributed system issues.

Symptoms

Replica set errors appear in various forms:

```text # Election failures No electable primary available Election failed, could not get majority vote MongoServerSelectionError: No primary available

# Sync issues Replication lag exceeds threshold Secondary stuck in RECOVERING state Initial sync failed

# Network partition Replica set configuration unreachable Member removed from config but still running Heartbeat timeout exceeded

# In rs.status() "stateStr": "RECOVERING" or "STARTUP2" stuck "lastHeartbeatMessage": "network error" "syncSourceHost": "" (empty - no sync source)

# In logs {"msg":"Heartbeat timeout","attr":{"memberId":2}} {"msg":"Election failed","attr":{"reason":"Received a step down request"}} ```

Common Causes

  1. 1.Network connectivity loss - Members cannot reach each other for heartbeats
  2. 2.Election timeout - No candidate can obtain majority votes within timeout window
  3. 3.Oplog exhaustion - Secondary's oplog window falls behind primary's
  4. 4.Initial sync failure - New member cannot complete initial data copy
  5. 5.Arbiter misplacement - Arbiter not properly counted for majority
  6. 6.Configuration mismatch - Members have different replica set config
  7. 7.Disk/network performance - Slow IO preventing timely sync

Step-by-Step Fix

Step 1: Check Replica Set Status

Diagnose the current state:

```javascript mongosh --host primary:27017

// Comprehensive status rs.status()

// Key fields to examine: // - "stateStr" for each member (PRIMARY, SECONDARY, etc.) // - "lastHeartbeatMessage" for connectivity issues // - "optime" and "optimeDurableDate" for sync lag // - "syncSourceHost" for replication source

// Configuration rs.conf()

// Check members list rs.conf().members.forEach(m => printjson(m)) ```

Step 2: Diagnose Election Issues

If no primary is available:

```javascript // Check if any member can become primary rs.status().members.forEach(m => { print(m._id + ": " + m.stateStr + " | priority: " + rs.conf().members[m._id-1].priority + " | votes: " + rs.conf().members[m._id-1].votes + " | healthy: " + m.health) })

// Count voting members that are healthy let healthyVoters = rs.status().members.filter(m => m.health === 1 && rs.conf().members.find(c => c._id === m._id)?.votes > 0 ).length print("Healthy voting members: " + healthyVoters) print("Required majority: " + Math.ceil(rs.conf().members.length / 2)) ```

Force election if appropriate:

```javascript // Connect to a member that should be primary rs.stepDown(60) // Step down current primary for 60 seconds

// Or force reconfig to change priorities let cfg = rs.conf() cfg.members[0].priority = 10 // Make member 0 highest priority cfg.members[1].priority = 5 cfg.version += 1 rs.reconfig(cfg) ```

Step 3: Fix Network Connectivity

Test connectivity between members:

```bash # From each member, test connection to others mongosh --host member2:27017 --eval "db.runCommand({ping:1})"

# Check firewall rules sudo iptables -L -n | grep 27017 sudo firewall-cmd --list-all

# Verify bindIp allows member connections cat /etc/mongod.conf | grep bindIp # Should include all member IPs, not just localhost ```

Update bindIp configuration:

```yaml # In mongod.conf on each member net: bindIp: localhost,192.168.1.10,192.168.1.11,192.168.1.12 port: 27017

# Also for replica set: replication: replSetName: "myReplicaSet" ```

Restart and verify:

bash
sudo systemctl restart mongod
mongosh --eval "rs.status()"

Step 4: Resolve Sync Stall

Check replication lag:

```javascript // Get current replication lag rs.printReplicationInfo() rs.printSlaveReplicationInfo()

// Or detailed: rs.status().members.forEach(m => { if (m.stateStr === "SECONDARY") { let lag = rs.status().members.find(p => p.stateStr === "PRIMARY").optimeDate - m.optimeDate print("Member " + m._id + " lag: " + (lag/1000) + " seconds") } }) ```

If secondary is stuck:

```javascript // Check sync source rs.status().members.forEach(m => { print(m.name + " syncing from: " + m.syncSourceHost) })

// Force resync from different source // Connect to the stuck secondary rs.syncFrom("other-secondary:27017") ```

If oplog exhausted (lag too large):

```javascript // Check oplog window use local let first = db.oplog.rs.find().sort({ts: 1}).limit(1).next() let last = db.oplog.rs.find().sort({ts: -1}).limit(1).next() print("Oplog window: " + ((last.ts.t - first.ts.t) / 60) + " minutes")

// If secondary needs full resync // Option 1: Increase oplog size db.adminCommand({ replSetResizeOplog: 1, size: 5120 }) // 5GB

// Option 2: Resync the secondary // Stop secondary, delete data, restart ```

Step 5: Fix Initial Sync Failure

For new members failing to sync:

```javascript // Check initial sync progress on the new member rs.status()

// Look for "stateStr": "STARTUP2" (initial sync) // and "lastHeartbeatMessage"

// Common issues: // 1. Cannot connect to sync source // 2. Data clone timeout // 3. Network instability during clone ```

Steps to retry initial sync:

```bash # On the failing member sudo systemctl stop mongod

# Remove data directory contents sudo rm -rf /var/lib/mongodb/*

# Restart - will perform fresh initial sync sudo systemctl start mongod

# Monitor sync progress mongosh --eval "rs.status()" ```

Step 6: Handle Network Partition

Recover from split-brain (multiple primaries):

```bash # Identify which partition has majority # The minority partition should have stepped down

# Check all members for host in member1 member2 member3; do echo "=== $host ===" mongosh --host $host:27017 --quiet --eval "rs.status().members.map(m => m.stateStr)" done ```

Force correct configuration:

```javascript // On the legitimate primary (majority side) let cfg = rs.conf()

// Remove unreachable members if partition is permanent cfg.members = cfg.members.filter(m => m.host !== "offline-member:27017") cfg.version += 1 rs.reconfig(cfg, { force: true }) // force bypasses majority check

// CAUTION: force:true can cause data loss if minority had writes ```

Step 7: Fix Configuration Mismatch

When members have different configs:

```javascript // Check each member's config version rs.status().members.forEach(m => { print(m.name + " configVersion: " + m.configVersion) })

// All should match rs.conf().version

// If mismatch, sync from primary // Connect to mismatched member rs.syncFrom("primary:27017") ```

Verification

Verify healthy replica set:

```javascript // 1. All members reachable rs.status().members.every(m => m.health === 1)

// 2. One primary, others secondary let primaries = rs.status().members.filter(m => m.stateStr === "PRIMARY") let secondaries = rs.status().members.filter(m => m.stateStr === "SECONDARY") print("Primaries: " + primaries.length + ", Secondaries: " + secondaries.length)

// 3. Replication lag acceptable (< 10 seconds) rs.printSlaveReplicationInfo()

// 4. Election test rs.stepDown(10) // Should elect new primary within 10 seconds

// 5. Write concern working db.test.insertOne({ x: 1 }, { writeConcern: { w: "majority" } }) ```

System verification:

```bash # Network connectivity ping member1 && ping member2 && ping member3

# MongoDB listening on all interfaces ss -tlnp | grep mongod

# Check logs for heartbeat success grep "Heartbeat" /var/log/mongodb/mongod.log | tail -20 ```

Common Pitfalls

  • Using force:true during normal operation - Should only be for emergency recovery
  • Ignoring arbiter placement - Arbiter must be reachable for majority
  • Not monitoring replication lag - Lag can silently grow during traffic spikes
  • Changing priorities incorrectly - Priority 0 members cannot become primary
  • Removing members without cleanup - Former members may still try to connect

Best Practices

  • Deploy odd number of voting members (3, 5, 7)
  • Monitor replication lag with alerts at 10 and 60 seconds
  • Place arbiters on separate infrastructure from data-bearing members
  • Use same MongoDB version across all members
  • Keep member clocks synchronized (NTP)
  • Test failover scenarios regularly
  • Document expected election time (typically < 10 seconds)
  • MongoDB Oplog Error
  • MongoDB Primary Step Down
  • MongoDB Connection Refused
  • MongoDB Write Concern Error