Introduction MongoDB limits each aggregation pipeline stage to 100MB of RAM. When a `$group` or `$sort` stage processes more data than fits in this limit, the pipeline fails with `MongoServerError: Exceeded memory limit for $group, but didn't allow external sort`. This is a common issue when aggregating large datasets without proper optimization.

Symptoms - `MongoServerError: Exceeded memory limit for $group` or `$sort` - `Sort exceeded memory limit of 104857600 bytes` - Aggregation works in development with small datasets but fails in production - `$explain` output shows stage with high `nReturned` and `executionTimeMillis` - Pipeline works with `$limit` but fails on the full collection

Common Causes - `$group` stage producing too many unique groups to fit in 100MB - `$sort` on a large unindexed collection without `allowDiskUse` - `$unwind` on arrays with many elements creating a Cartesian product explosion - Missing `$match` stage early in the pipeline to filter data before expensive stages - Aggregation on collections with large documents that individually consume significant memory

Step-by-Step Fix 1. **Enable disk use for the aggregation": ```javascript db.orders.aggregate([ { $match: { status: "completed" } }, { $group: { _id: "$customer_id", total: { $sum: "$amount" } } }, { $sort: { total: -1 } } ], { allowDiskUse: true // Allows stages to spill to disk }); ```

  1. 1.**Add an early $match stage to reduce data before expensive stages":
  2. 2.```javascript
  3. 3.// BAD: groups entire collection then filters
  4. 4.db.orders.aggregate([
  5. 5.{ $group: { _id: "$customer_id", total: { $sum: "$amount" } } },
  6. 6.{ $match: { total: { $gt: 1000 } } }
  7. 7.]);

// GOOD: filter first, then group db.orders.aggregate([ { $match: { amount: { $gt: 50 }, created_at: { $gte: ISODate("2026-01-01") } } }, { $group: { _id: "$customer_id", total: { $sum: "$amount" } } }, { $match: { total: { $gt: 1000 } } } ]); ```

  1. 1.**Use $project to reduce document size before grouping":
  2. 2.```javascript
  3. 3.db.orders.aggregate([
  4. 4.{ $match: { status: "completed" } },
  5. 5.{ $project: { customer_id: 1, amount: 1, _id: 0 } }, // Only needed fields
  6. 6.{ $group: { _id: "$customer_id", total: { $sum: "$amount" } } }
  7. 7.]);
  8. 8.`
  9. 9.**Create an index to support the aggregation":
  10. 10.```javascript
  11. 11.// Index to support the $match and $sort stages
  12. 12.db.orders.createIndex({ status: 1, created_at: -1, customer_id: 1, amount: 1 });

// Check if the index is being used db.orders.explain("executionStats").aggregate([ { $match: { status: "completed" } } ]); ```

  1. 1.**Break large aggregations into smaller batches":
  2. 2.```javascript
  3. 3.// Process by date ranges instead of the entire collection
  4. 4.const startDate = new Date("2026-01-01");
  5. 5.const endDate = new Date("2026-04-01");
  6. 6.const results = [];

for (let d = new Date(startDate); d < endDate; d.setMonth(d.getMonth() + 1)) { const monthEnd = new Date(d); monthEnd.setMonth(monthEnd.getMonth() + 1);

const monthResult = db.orders.aggregate([ { $match: { created_at: { $gte: d, $lt: monthEnd } } }, { $group: { _id: "$customer_id", total: { $sum: "$amount" } } } ]).toArray();

results.push(...monthResult); } ```

Prevention - Always include `allowDiskUse: true` for aggregations on collections larger than 100MB - Place `$match` and `$project` stages as early as possible in the pipeline - Create compound indexes that support the pipeline's filter and sort requirements - Use `explain()` to analyze pipeline execution plans before deploying - Monitor aggregation memory usage in production with profiler - Consider pre-aggregating data into summary collections for frequently accessed reports - Use MongoDB 5.0+ which has improved memory management for aggregation