Introduction Elasticsearch circuit breakers prevent operations from causing OutOfMemoryErrors by estimating memory usage and rejecting requests that would exceed thresholds. When the circuit breaker trips, all indexing and search operations fail with `ElasticsearchException: Data too large`, disrupting the application until memory pressure is relieved.

Symptoms - `circuit_breaking_exception: [parent] Data too large, data for [<http_request>] would be larger than limit` - `circuit_breaking_exception: [request] Data too large` for bulk indexing - Indexing rate drops to zero during breaker trips - `GET /_nodes/stats/breaker` shows `tripped` count increasing - Search queries return 429 or 503 errors

Common Causes - Bulk request size too large, exceeding the request circuit breaker limit - Aggregation queries with too many buckets consuming excessive memory - Too many concurrent heavy queries overwhelming the parent breaker - Heap size too small for the workload - Field data cache growing unbounded on text fields with aggregations

Step-by-Step Fix 1. **Check current circuit breaker status": ```bash curl -s localhost:9200/_nodes/stats/breaker | jq '.nodes[].breakers' # Key fields: # "tripped" - number of times the breaker has tripped # "limit_size_in_bytes" - the memory limit # "estimated_size_in_bytes" - current estimated usage ```

  1. 1.**Increase circuit breaker limits temporarily":
  2. 2.```bash
  3. 3.# Parent breaker (default 70% of heap)
  4. 4.curl -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
  5. 5."persistent": {
  6. 6."indices.breaker.total.use_real_memory": true,
  7. 7."indices.breaker.total.limit": "80%"
  8. 8.}
  9. 9.}'

# Request breaker (default 60% of heap) curl -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{ "persistent": { "indices.breaker.request.limit": "70%" } }' ```

  1. 1.**Reduce bulk request size":
  2. 2.```python
  3. 3.from elasticsearch import Elasticsearch

es = Elasticsearch(["http://localhost:9200"])

# Reduce bulk size from 10MB to 5MB def index_in_batches(docs, batch_size=500): for i in range(0, len(docs), batch_size): batch = docs[i:i + batch_size] # Check total size total_size = sum(len(json.dumps(d)) for d in batch) if total_size > 5 * 1024 * 1024: # 5MB limit # Split further half = len(batch) // 2 yield from index_in_batches(batch[:half], batch_size) yield from index_in_batches(batch[half:], batch_size) else: actions = [] for doc in batch: actions.append({"index": {"_index": "my_index", "_id": doc["id"]}}) actions.append(doc) es.bulk(operations=actions) ```

  1. 1.**Clear field data cache to free memory":
  2. 2.```bash
  3. 3.# Clear field data for all indices
  4. 4.curl -X POST localhost:9200/_cache/clear?fielddata=true

# Or for specific index curl -X POST localhost:9200/my_index/_cache/clear?fielddata=true ```

  1. 1.**Set fielddata circuit breaker for text fields":
  2. 2.```bash
  3. 3.curl -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
  4. 4."persistent": {
  5. 5."indices.breaker.fielddata.limit": "40%"
  6. 6.}
  7. 7.}'
  8. 8.`

Prevention - Keep bulk request sizes under 5-10MB (not number of documents, but total byte size) - Use `keyword` fields instead of `text` fields for aggregations - Set up monitoring on circuit breaker `tripped` counts with alerting - Size heap at 50% of available RAM but never more than 32GB (compressed oops threshold) - Use `doc_values` instead of `fielddata` for aggregatable fields - Implement circuit breaker monitoring in application code with exponential backoff - Use `_nodes/stats/breaker` in Grafana/Prometheus dashboards