# How to Fix Elasticsearch Thread Pool Rejection

Your Elasticsearch operations are being rejected with thread pool errors. Search queries fail, bulk indexing gets rejected, and your application logs show EsRejectedExecutionException. Let's diagnose and fix thread pool saturation.

Recognizing Thread Pool Rejection

The error message is clear:

json
{
  "error": {
    "root_cause": [
      {
        "type": "rejected_execution_exception",
        "reason": "rejected execution of org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportRequestHandler"
      }
    ],
    "type": "rejected_execution_exception",
    "reason": "rejected execution (queue capacity: 200)"
  },
  "status": 429
}

Or in Elasticsearch logs:

bash
[WARN ][o.e.t.TransportService] [node-1] rejected execution for thread pool [search] queue is full
[WARN ][o.e.t.TransportService] [node-1] rejected execution for thread pool [write] queue is full

HTTP status 429 indicates the server is too busy to handle the request.

Understanding Thread Pools

Elasticsearch has several thread pools for different operations:

Thread PoolPurposeDefault SizeDefault Queue
searchSearch queries(int)((available_processors * 3) / 2) + 11000
writeIndex/delete/updateavailable_processors200
bulkBulk operationsavailable_processors200
indexIndexingavailable_processors200
getGet operationsavailable_processors1000
refreshRefresh operationsavailable_processors200
listenerListener callbacksavailable_processors / 20
managementCluster management50

Check current thread pool settings:

bash
curl -X GET "localhost:9200/_nodes/stats/thread_pool?pretty"
json
{
  "nodes" : {
    "node-1" : {
      "thread_pool" : {
        "search" : {
          "threads" : 13,
          "queue" : 850,
          "active" : 13,
          "rejected" : 42
        },
        "write" : {
          "threads" : 8,
          "queue" : 200,
          "active" : 8,
          "rejected" : 150
        }
      }
    }
  }
}

The rejected counter shows how many operations were rejected.

Diagnosing the Root Cause

High Active Threads

When all threads are constantly active, the queue fills:

bash
curl -X GET "localhost:9200/_cat/thread_pool?v&h=node_name,name,active,queue,rejected&s=rejected:desc"
bash
node_name name   active queue rejected
node-1    search 13     1000  500
node-1    write  8      200   1200
node-2    search 13     800   50

Slow Operations

Thread pools fill when operations take too long. Check for slow searches:

bash
curl -X GET "localhost:9200/_nodes/stats/indices/search?pretty"

Look at query latency:

json
{
  "indices" : {
    "search" : {
      "query_time_in_millis" : 45000,
      "query_total" : 1000
    }
  }
}

Average query time = 45000 / 1000 = 45ms per query. If this is high, queries are slow.

Check for Hot Threads

See what's consuming threads:

bash
curl -X GET "localhost:9200/_nodes/hot_threads?threads=10&interval=500ms"
bash
Hot threads at 2024-01-15T10:30:00, interval=500ms:
  85.4% (427ms/500ms) cpu usage by thread 'elasticsearch[node-1][search][T#2]'
    2/10 snapshots sharing following 427ms of execution:
      org.elasticsearch.search.SearchService.executeQueryPhase(...)

This shows search queries are consuming CPU.

Solution 1: Increase Queue Size

For burst-heavy workloads, increase queue capacity:

bash
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "thread_pool.search.queue_size": 2000,
    "thread_pool.write.queue_size": 500,
    "thread_pool.bulk.queue_size": 500
  }
}
'

Larger queues absorb traffic spikes. However, they increase latency for queued operations.

Solution 2: Increase Thread Count

For sustained high load, add more threads:

bash
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "thread_pool.search.size": 20,
    "thread_pool.write.size": 12
  }
}
'

Be careful: more threads increase CPU and memory pressure. Only add threads if CPU isn't saturated.

Solution 3: Optimize Slow Operations

Find and fix slow queries:

bash
curl -X GET "localhost:9200/_nodes/stats/indices/search?pretty"

Identify problematic queries with profiling:

bash
curl -X GET "localhost:9200/your-index/_search?profile=true" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "content": "search terms"
    }
  }
}
'

Optimize queries by:

  • Using filter context for non-scoring queries
  • Reducing aggregation complexity
  • Avoiding deep pagination
  • Using keyword fields for exact matches

```bash # Before: slow scoring query curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d' { "query": { "match": { "status": "active" } } } '

# After: fast filter query curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d' { "query": { "bool": { "filter": [ { "term": { "status.keyword": "active" } } ] } } } ' ```

Solution 4: Reduce Request Rate

Throttle client requests:

```python import time from elasticsearch import Elasticsearch

es = Elasticsearch(['localhost:9200'])

def bulk_index_with_retry(documents, batch_size=500, max_retries=3): for retry in range(max_retries): try: success, failed = es.bulk_index(documents) if success: return True except Exception as e: if '429' in str(e): wait_time = (retry + 1) * 5 time.sleep(wait_time) else: raise return False ```

Use exponential backoff for rejected requests:

python
def index_with_backoff(es, index, doc, max_attempts=5):
    for attempt in range(max_attempts):
        try:
            return es.index(index=index, body=doc)
        except Exception as e:
            if 'rejected_execution' in str(e):
                wait = min(2 ** attempt, 30)
                time.sleep(wait)
            else:
                raise
    raise Exception("Max retries exceeded")

Solution 5: Add More Nodes

Scale horizontally to distribute load:

bash
curl -X GET "localhost:9200/_cat/nodes?v&h=name,cpu,load_1m,load_5m,load_15m"
bash
name   cpu load_1m load_5m load_15m
node-1 95  12.5    10.2    8.5
node-2 88  11.0    9.5     7.2

High load indicates need for more nodes. New nodes receive shards and handle more requests.

Solution 6: Use Adaptive Replica Selection

Let Elasticsearch choose less-loaded nodes:

bash
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "cluster.routing.use_adaptive_replica_selection": true
  }
}
'

This directs search requests to nodes with shorter queues.

Solution 7: Split Bulk Operations

Large bulk requests overwhelm the queue:

bash
# Don't send 10000 documents in one bulk
# Split into smaller batches
curl -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' -d'
{ "index": { "_index": "test" } }
{ "field": "value1" }
{ "index": { "_index": "test" } }
{ "field": "value2" }
# ... 500-1000 documents per bulk
'

Monitoring Thread Pool Health

Track thread pool metrics:

bash
#!/bin/bash
while true; do
  echo "=== Thread Pool Status ==="
  curl -s "localhost:9200/_cat/thread_pool?v&h=node_name,name,active,queue,rejected,completed&s=name:desc"
  echo ""
  sleep 5
done

Set up alerts for rejected operations:

bash
# Check rejection count
curl -X GET "localhost:9200/_nodes/stats/thread_pool?filter_path=nodes.*.thread_pool.*.rejected&pretty"

Thread Pool Configuration Best Practices

ScenarioConfiguration
Search-heavyIncrease search queue (2000-5000)
Write-heavyIncrease write/bulk queues
Burst trafficLarger queues, keep thread count
Sustained high loadAdd threads and nodes
Slow queriesOptimize queries first
Mixed workloadBalance all thread pools

Verification Steps

After making changes:

  1. 1.Check thread pool sizes applied:
bash
curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&flat_settings=true&pretty" | grep thread_pool
  1. 1.Monitor queue and rejection counts:
bash
curl -X GET "localhost:9200/_cat/thread_pool?v&h=name,active,queue,rejected"
  1. 1.Test under load:

```bash # Simulate search load for i in {1..100}; do curl -s "localhost:9200/logs/_search?q=*&size=10" > /dev/null & done wait

# Check if rejected curl -X GET "localhost:9200/_nodes/stats/thread_pool?filter_path=nodes.*.thread_pool.search.rejected" ```

  1. 1.Verify application retry logic works.

Summary

Thread pool rejection indicates cluster overload. Fix by:

  1. 1.Increasing queue sizes for traffic spikes
  2. 2.Adding threads for sustained load (if CPU available)
  3. 3.Optimizing slow queries and operations
  4. 4.Implementing client-side backoff and retry
  5. 5.Scaling horizontally with more nodes
  6. 6.Using adaptive replica selection
  7. 7.Splitting large bulk operations

Monitor thread pool metrics continuously and adjust configuration based on your workload patterns.