Fix Elasticsearch Thread Pool Rejected Errors - Queue Management Guide

# How to Fix Elasticsearch Thread Pool Rejection

Your Elasticsearch operations are being rejected with thread pool errors. Search queries fail, bulk indexing gets rejected, and your application logs show EsRejectedExecutionException. Let's diagnose and fix thread pool saturation.

Recognizing Thread Pool Rejection

The error message is clear:

json

{
  "error": {
    "root_cause": [
      {
        "type": "rejected_execution_exception",
        "reason": "rejected execution of org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportRequestHandler"
      }
    ],
    "type": "rejected_execution_exception",
    "reason": "rejected execution (queue capacity: 200)"
  },
  "status": 429
}

Or in Elasticsearch logs:

bash

[WARN ][o.e.t.TransportService] [node-1] rejected execution for thread pool [search] queue is full
[WARN ][o.e.t.TransportService] [node-1] rejected execution for thread pool [write] queue is full

HTTP status 429 indicates the server is too busy to handle the request.

Understanding Thread Pools

Elasticsearch has several thread pools for different operations:

Thread Pool	Purpose	Default Size	Default Queue
search	Search queries	(int)((available_processors * 3) / 2) + 1	1000
write	Index/delete/update	available_processors	200
bulk	Bulk operations	available_processors	200
index	Indexing	available_processors	200
get	Get operations	available_processors	1000
refresh	Refresh operations	available_processors	200
listener	Listener callbacks	available_processors / 2	0
management	Cluster management	5	0

Check current thread pool settings:

bash

curl -X GET "localhost:9200/_nodes/stats/thread_pool?pretty"

json

{
  "nodes" : {
    "node-1" : {
      "thread_pool" : {
        "search" : {
          "threads" : 13,
          "queue" : 850,
          "active" : 13,
          "rejected" : 42
        },
        "write" : {
          "threads" : 8,
          "queue" : 200,
          "active" : 8,
          "rejected" : 150
        }
      }
    }
  }
}

The rejected counter shows how many operations were rejected.

Diagnosing the Root Cause

High Active Threads

When all threads are constantly active, the queue fills:

bash

curl -X GET "localhost:9200/_cat/thread_pool?v&h=node_name,name,active,queue,rejected&s=rejected:desc"

bash

node_name name   active queue rejected
node-1    search 13     1000  500
node-1    write  8      200   1200
node-2    search 13     800   50

Slow Operations

Thread pools fill when operations take too long. Check for slow searches:

bash

curl -X GET "localhost:9200/_nodes/stats/indices/search?pretty"

Look at query latency:

json

{
  "indices" : {
    "search" : {
      "query_time_in_millis" : 45000,
      "query_total" : 1000
    }
  }
}

Average query time = 45000 / 1000 = 45ms per query. If this is high, queries are slow.

Check for Hot Threads

See what's consuming threads:

bash

curl -X GET "localhost:9200/_nodes/hot_threads?threads=10&interval=500ms"

bash

Hot threads at 2024-01-15T10:30:00, interval=500ms:
  85.4% (427ms/500ms) cpu usage by thread 'elasticsearch[node-1][search][T#2]'
    2/10 snapshots sharing following 427ms of execution:
      org.elasticsearch.search.SearchService.executeQueryPhase(...)

This shows search queries are consuming CPU.

Solution 1: Increase Queue Size

For burst-heavy workloads, increase queue capacity:

bash

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "thread_pool.search.queue_size": 2000,
    "thread_pool.write.queue_size": 500,
    "thread_pool.bulk.queue_size": 500
  }
}
'

Larger queues absorb traffic spikes. However, they increase latency for queued operations.

Solution 2: Increase Thread Count

For sustained high load, add more threads:

bash

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "thread_pool.search.size": 20,
    "thread_pool.write.size": 12
  }
}
'

Be careful: more threads increase CPU and memory pressure. Only add threads if CPU isn't saturated.

Solution 3: Optimize Slow Operations

Find and fix slow queries:

bash

curl -X GET "localhost:9200/_nodes/stats/indices/search?pretty"

Identify problematic queries with profiling:

bash

curl -X GET "localhost:9200/your-index/_search?profile=true" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "content": "search terms"
    }
  }
}
'

Optimize queries by:

Using filter context for non-scoring queries
Reducing aggregation complexity
Avoiding deep pagination
Using keyword fields for exact matches

```bash # Before: slow scoring query curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d' { "query": { "match": { "status": "active" } } } '

# After: fast filter query curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d' { "query": { "bool": { "filter": [ { "term": { "status.keyword": "active" } } ] } } } ' ```

Solution 4: Reduce Request Rate

Throttle client requests:

```python import time from elasticsearch import Elasticsearch

es = Elasticsearch(['localhost:9200'])

def bulk_index_with_retry(documents, batch_size=500, max_retries=3): for retry in range(max_retries): try: success, failed = es.bulk_index(documents) if success: return True except Exception as e: if '429' in str(e): wait_time = (retry + 1) * 5 time.sleep(wait_time) else: raise return False ```

Use exponential backoff for rejected requests:

python

def index_with_backoff(es, index, doc, max_attempts=5):
    for attempt in range(max_attempts):
        try:
            return es.index(index=index, body=doc)
        except Exception as e:
            if 'rejected_execution' in str(e):
                wait = min(2 ** attempt, 30)
                time.sleep(wait)
            else:
                raise
    raise Exception("Max retries exceeded")

Solution 5: Add More Nodes

Scale horizontally to distribute load:

bash

curl -X GET "localhost:9200/_cat/nodes?v&h=name,cpu,load_1m,load_5m,load_15m"

bash

name   cpu load_1m load_5m load_15m
node-1 95  12.5    10.2    8.5
node-2 88  11.0    9.5     7.2

High load indicates need for more nodes. New nodes receive shards and handle more requests.

Solution 6: Use Adaptive Replica Selection

Let Elasticsearch choose less-loaded nodes:

bash

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "cluster.routing.use_adaptive_replica_selection": true
  }
}
'

This directs search requests to nodes with shorter queues.

Solution 7: Split Bulk Operations

Large bulk requests overwhelm the queue:

bash

# Don't send 10000 documents in one bulk
# Split into smaller batches
curl -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' -d'
{ "index": { "_index": "test" } }
{ "field": "value1" }
{ "index": { "_index": "test" } }
{ "field": "value2" }
# ... 500-1000 documents per bulk
'

Monitoring Thread Pool Health

Track thread pool metrics:

bash

#!/bin/bash
while true; do
  echo "=== Thread Pool Status ==="
  curl -s "localhost:9200/_cat/thread_pool?v&h=node_name,name,active,queue,rejected,completed&s=name:desc"
  echo ""
  sleep 5
done

Set up alerts for rejected operations:

bash

# Check rejection count
curl -X GET "localhost:9200/_nodes/stats/thread_pool?filter_path=nodes.*.thread_pool.*.rejected&pretty"

Thread Pool Configuration Best Practices

Scenario	Configuration
Search-heavy	Increase search queue (2000-5000)
Write-heavy	Increase write/bulk queues
Burst traffic	Larger queues, keep thread count
Sustained high load	Add threads and nodes
Slow queries	Optimize queries first
Mixed workload	Balance all thread pools

Verification Steps

After making changes:

1.Check thread pool sizes applied:

bash

curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&flat_settings=true&pretty" | grep thread_pool

1.Monitor queue and rejection counts:

bash

curl -X GET "localhost:9200/_cat/thread_pool?v&h=name,active,queue,rejected"

1.Test under load:

```bash # Simulate search load for i in {1..100}; do curl -s "localhost:9200/logs/_search?q=*&size=10" > /dev/null & done wait

# Check if rejected curl -X GET "localhost:9200/_nodes/stats/thread_pool?filter_path=nodes.*.thread_pool.search.rejected" ```

1.Verify application retry logic works.

Summary

Thread pool rejection indicates cluster overload. Fix by:

1.Increasing queue sizes for traffic spikes
2.Adding threads for sustained load (if CPU available)
3.Optimizing slow queries and operations
4.Implementing client-side backoff and retry
5.Scaling horizontally with more nodes
6.Using adaptive replica selection
7.Splitting large bulk operations

Monitor thread pool metrics continuously and adjust configuration based on your workload patterns.

How to Fix Elasticsearch Thread Pool Rejection

Recognizing Thread Pool Rejection

Understanding Thread Pools

Diagnosing the Root Cause

High Active Threads

Slow Operations

Check for Hot Threads

Solution 1: Increase Queue Size

Solution 2: Increase Thread Count

Solution 3: Optimize Slow Operations

Solution 4: Reduce Request Rate

Solution 5: Add More Nodes

Solution 6: Use Adaptive Replica Selection

Solution 7: Split Bulk Operations

Monitoring Thread Pool Health

Thread Pool Configuration Best Practices

Verification Steps

Summary

Share this guide

More Monitoring Troubleshooting Guides

Metric Retention Expired

Timeseries Storage Full

Collector Agent Crashed

Webhook Notification Timeout

SMS Notification Failed

Email Notification Bounced