Introduction
MongoDB connection and replica set errors occur when applications cannot establish or maintain connections to MongoDB nodes, or when replica set failover causes temporary unavailability. Common causes include connection pool exhaustion from leaked connections, replica set election during primary failure, network partitions splitting replica set members, authentication credential expiration or misconfiguration, SSL/TLS certificate issues, write concern timeouts waiting for secondaries, read preference routing to unavailable nodes, and DNS resolution failures for replica set members. The fix requires understanding MongoDB architecture (replica sets, sharded clusters, connection pools), driver configuration, failover behavior, and recovery procedures. This guide provides production-proven troubleshooting for MongoDB connection issues across standalone, replica set, and sharded cluster deployments.
Symptoms
- Application throws
MongoServerError: Connection reset by peer MongoNetworkError: connect ECONNREFUSEDon connection attemptMongoWriteConcernError: waiting for replication timed outNo primary available for writesduring replica set election- Connection pool exhausted:
None of the servers are connectable Authentication faileddespite correct credentialsSSL handshake failedwith certificate verification error- Queries hang indefinitely without timeout
- Replica set members show
(not reachable/healthy)status TopologyDescriptionshows all servers as Unknown- Application works after restart but degrades over time
- Intermittent connection timeouts under load
Common Causes
- Connection pool size too small for application concurrency
- Connection leaks from unclosed cursors or sessions
- Replica set primary step-down during election
- Secondary nodes too far behind primary (replication lag)
- Network firewall blocking MongoDB port (27017)
- MongoDB service crashed or not started
- Authentication database mismatch in connection string
- SSL certificate expired or hostname mismatch
- Write concern
{ w: "majority" }with insufficient secondaries - Read preference
primaryPreferredwith no primary - DNS SRV records misconfigured for Atlas or cloud deployments
- Max connections reached on MongoDB server
- Client session timeout exceeded (default 30 minutes)
Step-by-Step Fix
### 1. Diagnose connection issues
Check MongoDB server status:
```bash # Check if MongoDB is running systemctl status mongod # or ps aux | grep mongod
# Check listening port netstat -tlnp | grep 27017 # or ss -tlnp | grep 27017
# Check MongoDB logs # Debian/Ubuntu: tail -f /var/log/mongodb/mongod.log
# RHEL/CentOS: tail -f /var/log/mongodb/mongod.log
# Or use mongosh to check server status mongosh --host localhost --port 27017
# Inside mongosh: db.adminCommand({ serverStatus: 1 })
# Check connections db.serverStatus().connections
# Output: # { # "current": 45, # "available": 955, # "totalCreated": 12500, # "active": 30, # "idle": 15 # } ```
Test connection from application server:
```bash # Test basic connectivity telnet mongodb.example.com 27017 # or nc -zv mongodb.example.com 27017
# Test with mongosh mongosh "mongodb://mongodb.example.com:27017/testdb"
# Test with authentication mongosh "mongodb://user:password@mongodb.example.com:27017/testdb?authSource=admin"
# Test replica set connection mongosh "mongodb://node1:27017,node2:27017,node3:27017/replicaset?replicaSet=rs0"
# Verbose connection debugging mongosh --host mongodb.example.com --port 27017 --verbose
# Check SSL/TLS connection mongosh "mongodb://mongodb.example.com:27017/?tls=true&tlsCAFile=/etc/ssl/certs/ca.pem" ```
### 2. Fix connection pool exhaustion
Connection pool configuration by driver:
```javascript // Node.js MongoDB Driver const { MongoClient } = require('mongodb');
const client = new MongoClient('mongodb://localhost:27017', { maxPoolSize: 100, // Max connections in pool (default: 100) minPoolSize: 10, // Min connections in pool (default: 0) maxIdleTimeMS: 30000, // Close idle connections after 30s waitQueueTimeoutMS: 5000, // Time to wait for connection from pool serverSelectionTimeoutMS: 30000, // Time to find available server socketTimeoutMS: 45000, // Time before socket disconnect connectTimeoutMS: 10000 // Connection timeout });
// Fix pool exhaustion: // 1. Increase maxPoolSize for high-concurrency apps // 2. Decrease maxIdleTimeMS to release idle connections faster // 3. Ensure connections are returned to pool (close cursors, sessions) ```
```python # Python PyMongo Driver from pymongo import MongoClient
client = MongoClient( 'mongodb://localhost:27017', maxPoolSize=100, minPoolSize=10, maxIdleTimeMS=30000, waitQueueTimeoutMS=5000, serverSelectionTimeoutMS=30000, socketTimeoutMS=45000, connectTimeoutMS=10000, retryWrites=True, retryReads=True )
# Fix pool exhaustion: # Same as Node.js - adjust pool size and timeouts ```
```java // Java MongoDB Driver MongoClientSettings settings = MongoClientSettings.builder() .applyToConnectionPoolSettings(builder -> builder .maxSize(100) .minSize(10) .maxConnectionIdleTime(30, TimeUnit.SECONDS) .maxWaitTime(5, TimeUnit.SECONDS) ) .applyToSocketSettings(builder -> builder .connectTimeout(10, TimeUnit.SECONDS) .readTimeout(45, TimeUnit.SECONDS) ) .serverSelectionTimeout(30, TimeUnit.SECONDS) .retryWrites(true) .retryReads(true) .build();
MongoClient mongoClient = MongoClients.create(settings); ```
Detect connection leaks:
```javascript // In application code, track connection usage // Node.js example with connection pool events
const { MongoClient } = require('mongodb');
const client = new MongoClient(uri, { maxPoolSize: 50 });
client.on('connectionPoolCreated', (event) => { console.log('Connection pool created:', event); });
client.on('connectionCheckedOut', (event) => { console.log('Connection checked out:', event); // Track long-running checkouts });
client.on('connectionCheckedIn', (event) => { console.log('Connection returned:', event); });
client.on('connectionPoolCleared', (event) => { console.log('Connection pool cleared - possible failover:', event); });
// Always close cursors and sessions const cursor = collection.find({}); try { const results = await cursor.toArray(); } finally { await cursor.close(); // CRITICAL: Return connection to pool }
// Use with statements for automatic cleanup async function getData() { const session = client.startSession(); try { return await session.withTransaction(async () => { return await collection.findOne({ _id: 1 }, { session }); }); } finally { await session.endSession(); // CRITICAL: Return session to pool } } ```
### 3. Fix replica set failover issues
Check replica set status:
```javascript // Connect to replica set mongosh "mongodb://node1:27017,node2:27017,node3:27017/?replicaSet=rs0"
// Check replica set status rs.status()
// Output: // { // "set": "rs0", // "date": "2026-04-01T12:00:00.000Z", // "myState": 1, // 1=PRIMARY, 2=SECONDARY, 3=RECOVERING, 8=DOWN // "term": 15, // "syncSourceHost": "node1:27017", // "heartbeatIntervalMillis": 2000, // "optimes": { ... }, // "members": [ // { // "name": "node1:27017", // "health": 1, // 1=healthy, 0=unhealthy // "state": 1, // 1=PRIMARY // "stateStr": "PRIMARY", // "uptime": 86400, // "optime": { ... }, // "optimeDate": "2026-04-01T12:00:00.000Z", // "lastHeartbeat": "2026-04-01T12:00:00.000Z", // "lastHeartbeatRecv": "2026-04-01T12:00:00.000Z", // "pingMs": 1, // "electionTime": "...", // "electionDate": "2026-04-01T12:00:00.000Z", // "configVersion": 3, // "configTerm": 15 // }, // { // "name": "node2:27017", // "health": 1, // "state": 2, // "stateStr": "SECONDARY", // "syncSourceHost": "node1:27017", // "header": "...", // "replicationLagSeconds": 0 // }, // { // "name": "node3:27017", // "health": 0, // UNHEALTHY // "state": 8, // "stateStr": "(not reachable/healthy)", // "lastHeartbeat": "2026-04-01T11:55:00.000Z", // "lastHeartbeatRecv": "2026-04-01T11:55:00.000Z", // "errmsg": "Connection reset by peer" // } // ], // "ok": 1 // }
// Check replica set configuration rs.conf()
// Output: // { // "_id": "rs0", // "version": 3, // "term": 15, // "members": [ // { "_id": 0, "host": "node1:27017", "priority": 2 }, // { "_id": 1, "host": "node2:27017", "priority": 1 }, // { "_id": 2, "host": "node3:27017", "priority": 1 } // ], // "settings": { // "chainingAllowed": true, // "heartbeatIntervalMillis": 2000, // "heartbeatTimeoutSecs": 10, // "electionTimeoutMillis": 10000, // "catchUpTimeoutMillis": -1 // } // } ```
Fix replica set issues:
```javascript // Reconfigure replica set to remove unhealthy member config = rs.conf() config.members = config.members.filter(m => m.host !== "node3:27017") rs.reconfig(config)
// Or add a new member to replace failed one rs.add("node4:27017")
// Force reconfiguration if primary is unavailable (DANGEROUS) // Only use when you're sure which node should be primary config = rs.conf() config.members[0].priority = 3 config.version++ rs.reconfig(config, { force: true })
// Step down primary gracefully (for maintenance) rs.stepDown(60) // Step down for 60 seconds
// Initiate replica set (new setup) rs.initiate({ _id: "rs0", members: [ { _id: 0, host: "node1:27017" }, { _id: 1, host: "node2:27017" }, { _id: 2, host: "node3:27017" } ] })
// Check replication lag db.adminCommand({ replSetGetStatus: 1 }).members.forEach(m => { print(m.name + " - " + m.stateStr + " - lag: " + (m.replicationLagSeconds || 0) + "s") }) ```
Application-side failover handling:
```javascript // Connection string with proper failover settings const uri = "mongodb://node1:27017,node2:27017,node3:27017/production" + "?replicaSet=rs0" + "&serverSelectionTimeoutMS=30000" + "&socketTimeoutMS=45000" + "&connectTimeoutMS=10000" + "&heartbeatFrequencyMS=10000" + "&retryWrites=true" + "&retryReads=true";
// During failover: // 1. Driver detects primary is unavailable // 2. Waits for new election (serverSelectionTimeoutMS) // 3. Reconnects to new primary automatically // 4. Retryable writes automatically retry once ```
### 4. Fix authentication failures
Authentication error diagnosis:
```bash # Test authentication mongosh -u admin -p password --authenticationDatabase admin
# Check users mongosh use admin db.auth("admin", "password") // Returns 1 on success, 0 on failure
# List users db.getUsers()
# Check user roles db.getUser("admin")
# Output: // { // "_id": "admin.admin", // "user": "admin", // "db": "admin", // "roles": [ // { "role": "root", "db": "admin" } // ] // } ```
Fix authentication configuration:
```javascript // Connection string with correct auth database // WRONG: authSource defaults to connection database mongodb://user:password@localhost:27017/myapp
// CORRECT: Specify authSource explicitly mongodb://user:password@localhost:27017/myapp?authSource=admin
// Common authSource values: // admin - for administrative users // <dbname> - for database-specific users
// Create user with proper roles use admin db.createUser({ user: "appuser", pwd: "securepassword", roles: [ { role: "readWrite", db: "myapp" }, { role: "dbAdmin", db: "myapp" } ] })
// Or create database-specific user use myapp db.createUser({ user: "appuser", pwd: "securepassword", roles: [ { role: "readWrite", db: "myapp" } ] })
// Connection string for database user mongodb://appuser:securepassword@localhost:27017/myapp?authSource=myapp ```
Authentication mechanism issues:
```javascript // SCRAM-SHA-1 vs SCRAM-SHA-256 // MongoDB 4.0+ uses SCRAM-SHA-256 by default
// Check authentication mechanism db.runCommand({ getParameter: 1, authenticationMechanisms: 1 })
// Force specific mechanism in connection string mongodb://user:password@localhost:27017/myapp?authMechanism=SCRAM-SHA-256 mongodb://user:password@localhost:27017/myapp?authMechanism=SCRAM-SHA-1
// For LDAP authentication mongodb://user:password@localhost:27017/myapp?authMechanism=PLAIN&authSource=$external
// For Kerberos (GSSAPI) mongodb://user@REALM@localhost:27017/myapp?authMechanism=GSSAPI ```
### 5. Fix SSL/TLS connection issues
Test SSL connection:
```bash # Test with openssl openssl s_client -connect mongodb.example.com:27017 -servername mongodb.example.com
# Check certificate openssl s_client -connect mongodb.example.com:27017 2>/dev/null | openssl x509 -noout -dates
# Test with mongosh mongosh "mongodb://mongodb.example.com:27017/?tls=true&tlsCAFile=/etc/ssl/certs/ca.pem"
# With client certificate mongosh "mongodb://mongodb.example.com:27017/?tls=true&tlsCAFile=/etc/ssl/certs/ca.pem&tlsCertificateKeyFile=/etc/ssl/certs/client.pem"
# Skip certificate validation (development only!) mongosh "mongodb://mongodb.example.com:27017/?tls=true&tlsInsecure=true" ```
MongoDB server SSL configuration:
```yaml # /etc/mongod.conf
net: port: 27017 tls: mode: requireTLS certificateKeyFile: /etc/ssl/certs/mongodb.pem CAFile: /etc/ssl/certs/ca.pem allowConnectionsWithoutCertificates: false allowInvalidCertificates: false allowInvalidHostnames: false
# Generate self-signed certificate for testing openssl req -newkey rsa:2048 -new -x509 -days 365 -nodes \ -out /etc/ssl/certs/mongodb.crt \ -keyout /etc/ssl/certs/mongodb.key
# Combine key and certificate cat /etc/ssl/certs/mongodb.crt /etc/ssl/certs/mongodb.key > /etc/ssl/certs/mongodb.pem chmod 400 /etc/ssl/certs/mongodb.pem chown mongodb:mongodb /etc/ssl/certs/mongodb.pem
# Restart MongoDB after SSL configuration change systemctl restart mongod ```
Connection string SSL parameters:
```javascript // Production connection with full verification const uri = "mongodb://mongodb.example.com:27017/production" + "?tls=true" + "&tlsCAFile=/etc/ssl/certs/ca.pem" + "&tlsCertificateKeyFile=/etc/ssl/certs/client.pem" + "&tlsAllowInvalidCertificates=false" + "&tlsAllowInvalidHostnames=false" + "&serverSelectionTimeoutMS=30000";
// Common SSL errors and fixes: // "unable to verify the first certificate" // -> Add tlsCAFile with CA bundle
// "Hostname/IP does not match certificate" // -> Ensure certificate CN/SAN matches hostname // -> Or use tlsAllowInvalidHostnames (not recommended)
// "certificate has expired" // -> Renew certificate // -> Or temporarily use tlsAllowInvalidCertificates (not recommended) ```
### 6. Fix write concern errors
Write concern configuration:
```javascript // Write concern levels: // w: 0 - No acknowledgment (fire and forget) // w: 1 - Acknowledged by primary (default) // w: "majority" - Acknowledged by majority of nodes // w: <number> - Acknowledged by N nodes
// Write concern error example db.products.insertOne( { name: "Widget", price: 9.99 }, { writeConcern: { w: "majority", wtimeout: 5000 } } )
// Error if majority cannot acknowledge in time: // MongoWriteConcernError: waiting for replication timed out
// Check replica set write concern capabilities rs.status().members.forEach(m => { print(m.name + " - " + m.stateStr + " - health: " + m.health) })
// If not enough healthy nodes, majority write concern fails ```
Fix write concern issues:
```javascript // Option 1: Reduce write concern for non-critical operations db.logs.insertOne( { event: "user_login", timestamp: new Date() }, { writeConcern: { w: 1 } } // Just acknowledge from primary )
// Option 2: Increase wtimeout for slow networks db.orders.insertOne( { order_id: 123, amount: 100 }, { writeConcern: { w: "majority", wtimeout: 30000 // 30 seconds instead of default 5 } } )
// Option 3: Fix replica set health // Ensure enough secondaries are available rs.status()
// If secondaries are lagging, check: // 1. Network latency between nodes // 2. Disk I/O on secondaries // 3. Load on secondaries (are they serving reads?)
// Option 4: Use appropriate write concern per operation // Critical data (payments, orders): w: "majority" // Non-critical data (logs, analytics): w: 1 // Fire-and-forget (metrics, events): w: 0
// Default write concern for all operations db.adminCommand({ setDefaultRWConcern: 1, defaultWriteConcern: { w: "majority", wtimeout: 10000 } }) ```
### 7. Fix DNS and connection string issues
DNS SRV records for MongoDB Atlas:
```bash # Check DNS SRV records nslookup -type=SRV _mongodb._tcp.cluster0.example.mongodb.net
# Output should show all replica set members: # _mongodb._tcp.cluster0.example.mongodb.net SRV service location: # priority = 0 # weight = 10 # port = 27017 # svr hostname = shard00-00.example.mongodb.net # svr hostname = shard00-01.example.mongodb.net # svr hostname = shard00-02.example.mongodb.net
# Check TXT records (contain replica set name and options) nslookup -type=TXT cluster0.example.mongodb.net
# Output: # cluster0.example.mongodb.net text = "authSource=admin&replicaSet=atlas-xxx-shard-0" ```
Connection string formats:
```javascript // Standard connection string mongodb://node1:27017,node2:27017,node3:27017/production?replicaSet=rs0
// DNS Seedlist (MongoDB Atlas, cloud deployments) mongodb+srv://cluster0.example.mongodb.net/production
// With full options mongodb://user:password@node1:27017,node2:27017,node3:27017/production? replicaSet=rs0& authSource=admin& ssl=true& serverSelectionTimeoutMS=30000& socketTimeoutMS=45000& connectTimeoutMS=10000& retryWrites=true& retryReads=true& readPreference=primaryPreferred& w=majority
// Common connection string errors: // Missing replica set name mongodb://node1:27017,node2:27017/ // Will only connect to node1
// Correct: mongodb://node1:27017,node2:27017/?replicaSet=rs0
// Wrong authSource mongodb://user:password@localhost:27017/myapp // Auth fails if user is in admin db
// Correct: mongodb://user:password@localhost:27017/myapp?authSource=admin ```
### 8. Monitor MongoDB connections and health
Server-side monitoring:
```javascript // Current connections and operations db.adminCommand({ currentOp: 1 })
// Filter long-running operations db.adminCommand({ currentOp: 1, "active": true, "secs_running": { "$gt": 10 } })
// Kill stuck operation db.killOp(<opid>)
// Connection statistics db.serverStatus().connections
// Output interpretation: // current: Active connections // available: Remaining connection slots // totalCreated: Cumulative connections since startup // active: Connections currently executing operations // idle: Connections waiting for commands
// If current approaches available, increase maxIncomingConnections // Edit mongod.conf: // net: // maxIncomingConnections: 65536 ```
Prometheus/Grafana monitoring:
```yaml # Install MongoDB exporter # https://github.com/percona/mongodb_exporter
docker run -d --name mongodb-exporter \ -p 9216:9216 \ percona/mongodb_exporter:0.40 \ --mongodb.uri="mongodb://localhost:27017"
# Prometheus scrape config scrape_configs: - job_name: 'mongodb' static_configs: - targets: ['localhost:9216']
# Grafana alert rules groups: - name: mongodb_health rules: - alert: MongoDBReplicaSetPrimaryDown expr: mongodb_ss_rs_state{state="PRIMARY"} == 0 for: 5m labels: severity: critical annotations: summary: "MongoDB replica set primary is down" description: "No primary available for writes"
- alert: MongoDBConnectionsHigh
- expr: mongodb_ss_connections_current / mongodb_ss_connections_available > 0.8
- for: 10m
- labels:
- severity: warning
- annotations:
- summary: "MongoDB connections running high"
- description: "Connection utilization at {{ $value | humanizePercentage }}"
- alert: MongoDBReplicationLagHigh
- expr: mongodb_ss_rs_replicationLag > 60
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "MongoDB replication lag high"
- description: "Replication lag is {{ $value }} seconds"
- alert: MongoDBMemoryHigh
- expr: mongodb_ss_mem_resident / mongodb_ss_mem_system > 0.9
- for: 10m
- labels:
- severity: warning
- annotations:
- summary: "MongoDB memory usage high"
- description: "Resident memory at {{ $value | humanizePercentage }}"
`
Application-level health checks:
```javascript // Periodic health check async function checkMongoHealth(client) { try { await client.db().admin().ping(); return { status: 'healthy', latency: Date.now() - startTime }; } catch (err) { return { status: 'unhealthy', error: err.message }; } }
// Monitor connection pool client.on('connectionPoolCreated', (event) => { metrics.gauge('mongodb_pool_size', event.options.maxPoolSize); });
client.on('connectionCheckedOut', (event) => { metrics.increment('mongodb_pool_checked_out'); });
client.on('connectionCheckedIn', (event) => { metrics.decrement('mongodb_pool_checked_out'); });
client.on('connectionPoolCleared', (event) => { metrics.increment('mongodb_pool_cleared_total'); logger.warn('MongoDB connection pool cleared - possible failover'); }); ```
Prevention
- Size connection pools appropriately (maxPoolSize = expected concurrency × 1.5)
- Always close cursors and sessions to prevent connection leaks
- Use retryWrites=true and retryReads=true for automatic failover handling
- Configure appropriate timeouts (serverSelectionTimeoutMS, socketTimeoutMS)
- Monitor replica set health with automated alerting
- Use write concern "majority" for critical data, w:1 for non-critical
- Keep MongoDB drivers updated for latest failover improvements
- Test failover scenarios regularly in staging
- Use DNS SRV records for cloud deployments (simpler connection strings)
- Implement connection pool monitoring in application metrics
Related Errors
- **CursorNotFound**: Cursor timed out or was killed
- **ExceededTimeLimit**: Operation exceeded maxTimeMS
- **ConflictingUpdateOperators**: Multiple operators on same field
- **DuplicateKey**: Unique index constraint violation
- **NamespaceNotFound**: Collection or index doesn't exist