Introduction

Elasticsearch aggregation memory exceeded when high cardinality field or complex pipeline aggregation. This guide provides step-by-step diagnosis and resolution.

Symptoms

Typical error output:

bash
ERROR: Aggregation memory exceeded
fielddata cache usage exceeds [40%] of heap
field "user_id" has too many unique values: 5000000

Common Causes

  1. 1.High cardinality field causing large fielddata
  2. 2.Terms aggregation with too many buckets
  3. 3.Nested or pipeline aggregation complexity
  4. 4.Fielddata cache size insufficient

Step-by-Step Fix

Step 1: Check Current State

bash
curl -XGET 'localhost:9200/_nodes/stats/indices?pretty'
curl -XGET 'localhost:9200/_nodes/stats/fielddata?pretty'
curl -XGET 'localhost:9200/_cat/fielddata?v'

Step 2: Identify Root Cause

bash
curl -XGET 'localhost:9200/_cluster/health?pretty'
curl -XGET 'localhost:9200/_nodes/stats?pretty'

Step 3: Apply Primary Fix

```bash # Increase fielddata cache size PUT _cluster/settings { "persistent": { "indices.fielddata.cache.size": "40%" } }

# Use keyword field instead of text for aggregations PUT my-index/_mapping { "properties": { "user_id": { "type": "keyword" } } } ```

Step 4: Apply Alternative Fix

```bash # Alternative fix: Check node stats GET _nodes/stats?pretty

# Update specific index settings PUT my-index/_settings { "index": { "refresh_interval": "30s" } }

# Verify the fix GET _cat/indices?v&index=my-index ```

Step 5: Verify the Fix

bash
curl -XGET 'localhost:9200/_nodes/stats/fielddata?pretty'
curl -XGET 'localhost:9200/_cat/fielddata?v'
# Field data size should be within limit

Common Pitfalls

  • Not waiting for cluster state propagation after settings change
  • Using text field for aggregations instead of keyword
  • Setting circuit breaker limits too low for production workload
  • Ignoring disk watermark warnings until cluster blocks

Best Practices

  • Monitor cluster health regularly with _cluster/health API
  • Use keyword fields for aggregations to avoid fielddata
  • Set circuit breaker limits based on heap size
  • Configure ILM policies for automated index management
  • Elasticsearch Cluster Red Status
  • Elasticsearch Index Not Found
  • Elasticsearch Query Timeout
  • Elasticsearch Node High CPU