# How to Fix Elasticsearch Search Timeout Errors
Your Elasticsearch queries are timing out. Clients receive timeout errors, search operations fail, and user experience suffers. Let's diagnose why queries are slow and fix timeout issues.
Recognizing Search Timeout Errors
The error appears in responses:
{
"error": {
"root_cause": [
{
"type": "search_phase_execution_exception",
"reason": "all shards failed"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query_fetch",
"caused_by": {
"type": "timeout_exception",
"reason": "Timeout occurred while executing query"
}
},
"status": 408
}Or partial results with timeout indication:
{
"took": 60000,
"timed_out": true,
"hits": {
"total": {
"value": 500,
"relation": "gte"
},
"hits": [...]
}
}The timed_out: true field indicates the query hit the timeout limit.
Understanding Query Timeout
Elasticsearch queries have several timeout options:
| Timeout Type | Purpose | Default |
|---|---|---|
| Search timeout | Overall query limit | No limit |
| Shard timeout | Per-shard execution | No limit |
| Fetch timeout | Fetch phase limit | No limit |
| Scroll timeout | Scroll context expiry | 1 minute |
Check current timeout settings:
curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&flat_settings=true&pretty" | grep timeoutDiagnosing Slow Queries
Find Slow Queries
Enable slow query logging:
curl -X PUT "localhost:9200/_all/_settings" -H 'Content-Type: application/json' -d'
{
"index": {
"search.slowlog.threshold.query.warn": "10s",
"search.slowlog.threshold.query.info": "5s",
"search.slowlog.threshold.query.debug": "2s",
"search.slowlog.threshold.fetch.warn": "5s"
}
}
'Check the slow query log:
grep "slowlog" /var/log/elasticsearch/elasticsearch.log[WARN ][i.s.slog ] [node-1] [logs-2024-01][0] took[15.2s], took_millis[15200], total_hits[1000000], types[], stats[]Profile the Query
Use query profiling to see where time is spent:
curl -X GET "localhost:9200/logs/_search?profile=true" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"message": "error warning critical"
}
}
}
'{
"profile" : {
"shards" : [
{
"id" : "[logs-2024-01][0]",
"searches" : [
{
"query" : [
{
"type" : "BooleanQuery",
"description" : "message:error message:warning message:critical",
"time_in_nanos" : 25000000000,
"breakdown" : {
"create_weight" : 500000000,
"next_doc" : 15000000000,
"score" : 5000000000
}
}
]
}
]
}
]
}
}The breakdown shows where time is consumed.
Check Query Statistics
curl -X GET "localhost:9200/_nodes/stats/indices/search?pretty"{
"indices" : {
"search" : {
"query_time_in_millis" : 120000,
"query_total" : 1500,
"fetch_time_in_millis" : 30000,
"fetch_total" : 1500,
"scroll_time_in_millis" : 500000,
"scroll_total" : 50
}
}
}Calculate averages:
- Average query time: 120000 / 1500 = 80ms
- Average fetch time: 30000 / 1500 = 20ms
Solution 1: Increase Timeout Values
For legitimate long-running queries, extend the timeout:
curl -X GET "localhost:9200/logs/_search?timeout=60s" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"message": "error"
}
}
}
'Set a cluster-level default:
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"transient": {
"search.default_search_timeout": "30s",
"search.default_keep_alive": "60s"
}
}
'Per-shard timeout:
curl -X GET "localhost:9200/logs/_search?timeout=60s&shard_timeout=30s" -H 'Content-Type: application/json' -d'
{
"query": {...}
}
'Solution 2: Optimize Query Structure
Use Filter Instead of Query
Filters are faster and cacheable:
```bash # Before: slow scoring query curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d' { "query": { "match": { "status": "error" } } } '
# After: fast filter (no scoring) curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d' { "query": { "bool": { "filter": [ { "term": { "status.keyword": "error" } } ] } } } ' ```
Reduce Result Size
Limit returned fields and documents:
curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
"query": {...},
"size": 50,
"_source": ["timestamp", "message", "level"],
"fields": ["timestamp", "message", "level"]
}
'Avoid Deep Pagination
Don't use large from values:
# Bad: deep pagination
curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
"query": {...},
"from": 10000,
"size": 10
}
'Use search_after for pagination:
curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
"query": {...},
"size": 10,
"sort": [
{ "timestamp": "desc" },
{ "_id": "asc" }
],
"search_after": ["2024-01-15T10:30:00", "doc123"]
}
'Simplify Aggregations
Large aggregations are slow:
# Before: complex multi-level aggregation
curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
"aggs": {
"by_user": {
"terms": { "field": "user_id", "size": 10000 },
"aggs": {
"by_date": {
"date_histogram": { "field": "timestamp", "interval": "hour" },
"aggs": {
"avg_duration": { "avg": { "field": "duration" } }
}
}
}
}
}
}
'Reduce aggregation size:
curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
"aggs": {
"by_user": {
"terms": { "field": "user_id", "size": 100 }
}
}
}
'Solution 3: Optimize Index Settings
Force Merge Segments
Fewer segments mean faster searches:
curl -X POST "localhost:9200/logs-2024-01/_forcemerge?max_num_segments=1"Only use on indices no longer receiving writes.
Increase Refresh Interval
Reduce refresh overhead for write-heavy indices:
curl -X PUT "localhost:9200/logs/_settings" -H 'Content-Type: application/json' -d'
{
"index": {
"refresh_interval": "30s"
}
}
'Enable Query Cache
curl -X PUT "localhost:9200/logs/_settings" -H 'Content-Type: application/json' -d'
{
"index": {
"queries.cache.size": "10%"
}
}
'Solution 4: Scale the Cluster
Add More Shards
Distribute query load:
```bash # Check current shard count curl -X GET "localhost:9200/logs/_settings?pretty" | grep number_of_shards
# For new indices, increase shards curl -X PUT "localhost:9200/logs-new" -H 'Content-Type: application/json' -d' { "settings": { "number_of_shards": 10 } } ' ```
More shards parallelize queries but increase overhead.
Add More Nodes
curl -X GET "localhost:9200/_cat/nodes?v&h=name,cpu,heap.percent"High CPU or heap indicates need for more capacity.
Use Warm Nodes
Move cold indices to warm nodes:
curl -X PUT "localhost:9200/logs-2023-*/_settings" -H 'Content-Type: application/json' -d'
{
"index.routing.allocation.require._tier_preference": "data_warm"
}
'Solution 5: Use Async Search
For long-running queries, use async search (ES 7.7+):
curl -X POST "localhost:9200/logs/_async_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"message": "error"
}
},
"keep_on_completion": true,
"keep_alive": "1h"
}
'{
"id" : "FmRlE...",
"is_running" : true,
"expiration_time_in_millis" : 3600000
}Retrieve results later:
curl -X GET "localhost:9200/_async_search/FmRlE..."Solution 6: Cancel Running Queries
Cancel queries taking too long:
# List running tasks
curl -X GET "localhost:9200/_tasks?detailed=true&actions=*search*&pretty"{
"tasks" : {
"node-1:12345" : {
"action" : "indices:data/read/search",
"description" : "indices[logs], types[], search_type[QUERY_THEN_FETCH]",
"start_time_in_millis" : 1705305000000,
"running_time_in_nanos" : 60000000000
}
}
}Cancel a specific task:
curl -X POST "localhost:9200/_tasks/node-1:12345/_cancel"Cancel all search tasks:
curl -X POST "localhost:9200/_tasks/_cancel?actions=*search*"Monitoring Query Performance
Track query metrics:
curl -X GET "localhost:9200/_nodes/stats/indices/search?filter_path=nodes.*.indices.search&pretty"Set up monitoring:
#!/bin/bash
while true; do
echo "=== Search Stats ==="
curl -s "localhost:9200/_nodes/stats/indices/search" | jq '
.nodes | to_entries[] | {
node: .value.name,
query_total: .value.indices.search.query_total,
query_time_ms: .value.indices.search.query_time_in_millis,
avg_query_ms: (.value.indices.search.query_time_in_millis / .value.indices.search.query_total)
}
'
echo ""
sleep 60
doneTimeout Configuration Matrix
| Workload Type | Recommended Timeout |
|---|---|
| Interactive UI | 5-10 seconds |
| API queries | 30 seconds |
| Analytics/reports | 60-300 seconds |
| Bulk exports | Use scroll or async |
| Real-time alerts | 2-5 seconds |
Verification Steps
After optimization:
- 1.Run the previously slow query:
curl -X GET "localhost:9200/logs/_search?profile=true" -H 'Content-Type: application/json' -d'
{ your query }
'- 1.Check slow log for improvement:
grep "slowlog" /var/log/elasticsearch/elasticsearch.log | tail -20- 1.Monitor query latency:
curl -X GET "localhost:9200/_nodes/stats/indices/search"- 1.Test under load:
# Run concurrent queries
for i in {1..50}; do
curl -s "localhost:9200/logs/_search?q=error&size=100" > /dev/null &
done
waitSummary
Search timeout issues are resolved by:
- 1.Setting appropriate timeout values for your workload
- 2.Optimizing query structure (filters, limited fields, search_after)
- 3.Reducing aggregation complexity
- 4.Optimizing index settings (segment merging, refresh interval)
- 5.Scaling the cluster (more shards, more nodes)
- 6.Using async search for long-running queries
- 7.Implementing query cancellation for runaway queries
- 8.Monitoring query performance continuously
Focus on query optimization before increasing timeouts. A fast query is better than a slow one with a longer timeout.