MongoDB Sharded Cluster Balancer Stuck - Diagnosis and Fix

Introduction The MongoDB balancer automatically migrates chunks between shards to maintain even data distribution. When the balancer gets stuck—often due to a failed migration, config server inconsistency, or an active maintenance lock—the cluster can become severely imbalanced, with one shard handling disproportionate read/write traffic.

Symptoms - `sh.isBalancerRunning()` returns `true` for hours without completing - Chunk distribution is heavily skewed across shards - Balancer log shows repeated `migration failed` for the same chunk - `config.locks` shows a balancer lock that is not being released - `mongos` logs show `balancer: could not acquire balancer lock`

Common Causes - Previous migration left a chunk in a transitional state (jumbo flag) - Config server replica set is not healthy, preventing lock management - Balancer window is too narrow, not enough time to complete migrations - Network partition between mongos and config servers - Manual `moveChunk` operation conflicting with balancer

Step-by-Step Fix 1. **Check balancer state and current migrations": ```javascript sh.isBalancerRunning() sh.getBalancerState()

// Check for active migrations db.getSiblingDB("config").locks.find({ _id: "balancer" })

// Check chunk distribution db.getSiblingDB("config").chunks.aggregate([ { $group: { _id: "$shard", count: { $sum: 1 } } }, { $sort: { count: -1 } } ]) ```

1.**Stop and restart the balancer":
2.```javascript
3.// Stop the balancer
4.sh.stopBalancer()

// Verify it stopped sh.getBalancerState() // Should be false sh.isBalancerRunning() // Should be false

// Clear any stale migration state db.getSiblingDB("config").locks.remove({ _id: "balancer" })

// Restart the balancer sh.startBalancer() ```

1.**Clear jumbo chunks that block migration":
2.```javascript
3.db.getSiblingDB("config").chunks.updateMany(
4.{ jumbo: true },
5.{ $unset: { jumbo: "" } }
6.);
7.`
8.**Manually move chunks from overloaded shards":
9.```javascript
10.// Identify chunks to move
11.var chunks = db.getSiblingDB("config").chunks.find({
12.ns: "mydb.mycollection",
13.shard: "shard1"
14.}).limit(5);

chunks.forEach(function(chunk) { db.adminCommand({ moveChunk: "mydb.mycollection", find: chunk.min, to: "shard2", _secondaryThrottle: true }); }); ```

1.**Check config server replica set health":
2.```javascript
3.// Connect to config server
4.use config
5.rs.status()
6.// Ensure all config servers are healthy and one is PRIMARY
7.`

Prevention - Monitor balancer state with automated checks every 5 minutes - Keep the balancer window wide open (24/7) unless there is a specific reason to restrict it - Monitor chunk distribution regularly with `sh.status()` - Ensure config server replica set has odd members and good health - Test balancer behavior during maintenance by running `sh.stopBalancer()` and `sh.startBalancer()` - Use proper shard keys that distribute data evenly from the start - Avoid manual `moveChunk` operations that can conflict with the balancer

MongoDB Sharded Cluster Balancer Stuck in Balancing State

Step-by-Step Fix 1. **Check balancer state and current migrations": ```javascript sh.isBalancerRunning() sh.getBalancerState()

Share this guide

More MongoDB Troubleshooting Guides

MongoDB OIDC Authentication Failed

MongoDB LDAP Authorization Failed

MongoDB x509 Certificate Auth Failed

Fix MongoDB Connection Refused

Fix MongoDB Replication Lag High

MongoDB SCRAM Authentication Failed