Introduction The MongoDB balancer automatically migrates chunks between shards to maintain even data distribution. When the balancer gets stuck—often due to a failed migration, config server inconsistency, or an active maintenance lock—the cluster can become severely imbalanced, with one shard handling disproportionate read/write traffic.
Symptoms - `sh.isBalancerRunning()` returns `true` for hours without completing - Chunk distribution is heavily skewed across shards - Balancer log shows repeated `migration failed` for the same chunk - `config.locks` shows a balancer lock that is not being released - `mongos` logs show `balancer: could not acquire balancer lock`
Common Causes - Previous migration left a chunk in a transitional state (jumbo flag) - Config server replica set is not healthy, preventing lock management - Balancer window is too narrow, not enough time to complete migrations - Network partition between mongos and config servers - Manual `moveChunk` operation conflicting with balancer
Step-by-Step Fix 1. **Check balancer state and current migrations": ```javascript sh.isBalancerRunning() sh.getBalancerState()
// Check for active migrations db.getSiblingDB("config").locks.find({ _id: "balancer" })
// Check chunk distribution db.getSiblingDB("config").chunks.aggregate([ { $group: { _id: "$shard", count: { $sum: 1 } } }, { $sort: { count: -1 } } ]) ```
- 1.**Stop and restart the balancer":
- 2.```javascript
- 3.// Stop the balancer
- 4.sh.stopBalancer()
// Verify it stopped sh.getBalancerState() // Should be false sh.isBalancerRunning() // Should be false
// Clear any stale migration state db.getSiblingDB("config").locks.remove({ _id: "balancer" })
// Restart the balancer sh.startBalancer() ```
- 1.**Clear jumbo chunks that block migration":
- 2.```javascript
- 3.db.getSiblingDB("config").chunks.updateMany(
- 4.{ jumbo: true },
- 5.{ $unset: { jumbo: "" } }
- 6.);
- 7.
` - 8.**Manually move chunks from overloaded shards":
- 9.```javascript
- 10.// Identify chunks to move
- 11.var chunks = db.getSiblingDB("config").chunks.find({
- 12.ns: "mydb.mycollection",
- 13.shard: "shard1"
- 14.}).limit(5);
chunks.forEach(function(chunk) { db.adminCommand({ moveChunk: "mydb.mycollection", find: chunk.min, to: "shard2", _secondaryThrottle: true }); }); ```
- 1.**Check config server replica set health":
- 2.```javascript
- 3.// Connect to config server
- 4.use config
- 5.rs.status()
- 6.// Ensure all config servers are healthy and one is PRIMARY
- 7.
`