Introduction MongoDB's balancer migrates chunks between shards to maintain even data distribution. When a chunk migration fails—due to network issues, document size limits, or concurrent modifications—the balancer marks the chunk as jumbo and skips it, leading to data imbalance and hot shards.

Symptoms - `config.chunks` shows chunks with `jumbo: true` flag - Balancer logs show `chunk migration failed` with specific error messages - One shard has significantly more data than others - Queries routing to the overloaded shard have higher latency - `db.printShardingStatus()` shows uneven chunk distribution

Common Causes - Chunk contains documents exceeding the 16MB BSON size limit - Network timeout during migration of large chunks - Concurrent write to the chunk during migration causing version conflict - Recipient shard running out of disk space during migration - Migration throttled by `balancer` window being too restrictive

Step-by-Step Fix 1. **Identify failed migrations and jumbo chunks**: ```javascript // Find jumbo chunks db.getSiblingDB("config").chunks.find({ jumbo: true }).forEach(function(c) { print("Jumbo chunk: " + c.ns + " | shard: " + c.shard + " | min: " + JSON.stringify(c.min)); });

// Check balancer status sh.getBalancerState(); sh.isBalancerRunning(); ```

  1. 1.Clear the jumbo flag and retry migration:
  2. 2.```javascript
  3. 3.// Clear jumbo flag for a specific chunk
  4. 4.db.getSiblingDB("config").chunks.updateOne(
  5. 5.{ ns: "mydb.mycollection", jumbo: true },
  6. 6.{ $unset: { jumbo: "" } }
  7. 7.);

// Or clear all jumbo flags db.getSiblingDB("config").chunks.updateMany( { jumbo: true }, { $unset: { jumbo: "" } } ); ```

  1. 1.Split the oversized chunk before migration:
  2. 2.```javascript
  3. 3.// Find the chunk range
  4. 4.var chunk = db.getSiblingDB("config").chunks.findOne({
  5. 5.ns: "mydb.mycollection",
  6. 6.jumbo: true
  7. 7.});

// Split at the midpoint sh.splitAt("mydb.mycollection", chunk.min);

// Or split manually db.adminCommand({ split: "mydb.mycollection", middle: chunk.min }); ```

  1. 1.Manually move the chunk to a specific shard:
  2. 2.```javascript
  3. 3.db.adminCommand({
  4. 4.moveChunk: "mydb.mycollection",
  5. 5.find: { shardKeyField: "value" },
  6. 6.to: "shard2",
  7. 7._secondaryThrottle: true,
  8. 8._waitForDelete: true
  9. 9.});
  10. 10.`
  11. 11.Check migration logs for specific error causes:
  12. 12.```javascript
  13. 13.// On the mongos
  14. 14.db.adminCommand({ getLog: "global" }).log.filter(function(line) {
  15. 15.return line.match(/moveChunk|migrate|jumbo/i);
  16. 16.});
  17. 17.`
  18. 18.Adjust balancer settings for the migration window:
  19. 19.```javascript
  20. 20.// Set a wider balancer window
  21. 21.sh.setBalancerState(true);
  22. 22.db.adminCommand({
  23. 23.setParameter: 1,
  24. 24.balancerBulkMigrateBatchSize: 2
  25. 25.});
  26. 26.`

Prevention - Use an appropriate shard key that distributes data evenly - Monitor chunk sizes and split proactively before they grow too large - Set `balancer` to run during off-peak hours with `sh.setBalancerWindow()` - Ensure all shards have sufficient disk headroom (at least 20% free) - Monitor chunk distribution with `db.collection.getShardDistribution()` - Avoid schema designs that create unsharable chunks (all documents with same shard key) - Set `chunkSize` to an appropriate value (default 128MB, consider 64MB for faster migrations)