# How to Fix Elasticsearch Thread Pool Rejection
Your Elasticsearch operations are being rejected with thread pool errors. Search queries fail, bulk indexing gets rejected, and your application logs show EsRejectedExecutionException. Let's diagnose and fix thread pool saturation.
Recognizing Thread Pool Rejection
The error message is clear:
{
"error": {
"root_cause": [
{
"type": "rejected_execution_exception",
"reason": "rejected execution of org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportRequestHandler"
}
],
"type": "rejected_execution_exception",
"reason": "rejected execution (queue capacity: 200)"
},
"status": 429
}Or in Elasticsearch logs:
[WARN ][o.e.t.TransportService] [node-1] rejected execution for thread pool [search] queue is full
[WARN ][o.e.t.TransportService] [node-1] rejected execution for thread pool [write] queue is fullHTTP status 429 indicates the server is too busy to handle the request.
Understanding Thread Pools
Elasticsearch has several thread pools for different operations:
| Thread Pool | Purpose | Default Size | Default Queue |
|---|---|---|---|
| search | Search queries | (int)((available_processors * 3) / 2) + 1 | 1000 |
| write | Index/delete/update | available_processors | 200 |
| bulk | Bulk operations | available_processors | 200 |
| index | Indexing | available_processors | 200 |
| get | Get operations | available_processors | 1000 |
| refresh | Refresh operations | available_processors | 200 |
| listener | Listener callbacks | available_processors / 2 | 0 |
| management | Cluster management | 5 | 0 |
Check current thread pool settings:
curl -X GET "localhost:9200/_nodes/stats/thread_pool?pretty"{
"nodes" : {
"node-1" : {
"thread_pool" : {
"search" : {
"threads" : 13,
"queue" : 850,
"active" : 13,
"rejected" : 42
},
"write" : {
"threads" : 8,
"queue" : 200,
"active" : 8,
"rejected" : 150
}
}
}
}
}The rejected counter shows how many operations were rejected.
Diagnosing the Root Cause
High Active Threads
When all threads are constantly active, the queue fills:
curl -X GET "localhost:9200/_cat/thread_pool?v&h=node_name,name,active,queue,rejected&s=rejected:desc"node_name name active queue rejected
node-1 search 13 1000 500
node-1 write 8 200 1200
node-2 search 13 800 50Slow Operations
Thread pools fill when operations take too long. Check for slow searches:
curl -X GET "localhost:9200/_nodes/stats/indices/search?pretty"Look at query latency:
{
"indices" : {
"search" : {
"query_time_in_millis" : 45000,
"query_total" : 1000
}
}
}Average query time = 45000 / 1000 = 45ms per query. If this is high, queries are slow.
Check for Hot Threads
See what's consuming threads:
curl -X GET "localhost:9200/_nodes/hot_threads?threads=10&interval=500ms"Hot threads at 2024-01-15T10:30:00, interval=500ms:
85.4% (427ms/500ms) cpu usage by thread 'elasticsearch[node-1][search][T#2]'
2/10 snapshots sharing following 427ms of execution:
org.elasticsearch.search.SearchService.executeQueryPhase(...)This shows search queries are consuming CPU.
Solution 1: Increase Queue Size
For burst-heavy workloads, increase queue capacity:
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"transient": {
"thread_pool.search.queue_size": 2000,
"thread_pool.write.queue_size": 500,
"thread_pool.bulk.queue_size": 500
}
}
'Larger queues absorb traffic spikes. However, they increase latency for queued operations.
Solution 2: Increase Thread Count
For sustained high load, add more threads:
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"transient": {
"thread_pool.search.size": 20,
"thread_pool.write.size": 12
}
}
'Be careful: more threads increase CPU and memory pressure. Only add threads if CPU isn't saturated.
Solution 3: Optimize Slow Operations
Find and fix slow queries:
curl -X GET "localhost:9200/_nodes/stats/indices/search?pretty"Identify problematic queries with profiling:
curl -X GET "localhost:9200/your-index/_search?profile=true" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"content": "search terms"
}
}
}
'Optimize queries by:
- Using filter context for non-scoring queries
- Reducing aggregation complexity
- Avoiding deep pagination
- Using keyword fields for exact matches
```bash # Before: slow scoring query curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d' { "query": { "match": { "status": "active" } } } '
# After: fast filter query curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d' { "query": { "bool": { "filter": [ { "term": { "status.keyword": "active" } } ] } } } ' ```
Solution 4: Reduce Request Rate
Throttle client requests:
```python import time from elasticsearch import Elasticsearch
es = Elasticsearch(['localhost:9200'])
def bulk_index_with_retry(documents, batch_size=500, max_retries=3): for retry in range(max_retries): try: success, failed = es.bulk_index(documents) if success: return True except Exception as e: if '429' in str(e): wait_time = (retry + 1) * 5 time.sleep(wait_time) else: raise return False ```
Use exponential backoff for rejected requests:
def index_with_backoff(es, index, doc, max_attempts=5):
for attempt in range(max_attempts):
try:
return es.index(index=index, body=doc)
except Exception as e:
if 'rejected_execution' in str(e):
wait = min(2 ** attempt, 30)
time.sleep(wait)
else:
raise
raise Exception("Max retries exceeded")Solution 5: Add More Nodes
Scale horizontally to distribute load:
curl -X GET "localhost:9200/_cat/nodes?v&h=name,cpu,load_1m,load_5m,load_15m"name cpu load_1m load_5m load_15m
node-1 95 12.5 10.2 8.5
node-2 88 11.0 9.5 7.2High load indicates need for more nodes. New nodes receive shards and handle more requests.
Solution 6: Use Adaptive Replica Selection
Let Elasticsearch choose less-loaded nodes:
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"transient": {
"cluster.routing.use_adaptive_replica_selection": true
}
}
'This directs search requests to nodes with shorter queues.
Solution 7: Split Bulk Operations
Large bulk requests overwhelm the queue:
# Don't send 10000 documents in one bulk
# Split into smaller batches
curl -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' -d'
{ "index": { "_index": "test" } }
{ "field": "value1" }
{ "index": { "_index": "test" } }
{ "field": "value2" }
# ... 500-1000 documents per bulk
'Monitoring Thread Pool Health
Track thread pool metrics:
#!/bin/bash
while true; do
echo "=== Thread Pool Status ==="
curl -s "localhost:9200/_cat/thread_pool?v&h=node_name,name,active,queue,rejected,completed&s=name:desc"
echo ""
sleep 5
doneSet up alerts for rejected operations:
# Check rejection count
curl -X GET "localhost:9200/_nodes/stats/thread_pool?filter_path=nodes.*.thread_pool.*.rejected&pretty"Thread Pool Configuration Best Practices
| Scenario | Configuration |
|---|---|
| Search-heavy | Increase search queue (2000-5000) |
| Write-heavy | Increase write/bulk queues |
| Burst traffic | Larger queues, keep thread count |
| Sustained high load | Add threads and nodes |
| Slow queries | Optimize queries first |
| Mixed workload | Balance all thread pools |
Verification Steps
After making changes:
- 1.Check thread pool sizes applied:
curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&flat_settings=true&pretty" | grep thread_pool- 1.Monitor queue and rejection counts:
curl -X GET "localhost:9200/_cat/thread_pool?v&h=name,active,queue,rejected"- 1.Test under load:
```bash # Simulate search load for i in {1..100}; do curl -s "localhost:9200/logs/_search?q=*&size=10" > /dev/null & done wait
# Check if rejected curl -X GET "localhost:9200/_nodes/stats/thread_pool?filter_path=nodes.*.thread_pool.search.rejected" ```
- 1.Verify application retry logic works.
Summary
Thread pool rejection indicates cluster overload. Fix by:
- 1.Increasing queue sizes for traffic spikes
- 2.Adding threads for sustained load (if CPU available)
- 3.Optimizing slow queries and operations
- 4.Implementing client-side backoff and retry
- 5.Scaling horizontally with more nodes
- 6.Using adaptive replica selection
- 7.Splitting large bulk operations
Monitor thread pool metrics continuously and adjust configuration based on your workload patterns.