# How to Fix Elasticsearch Search Timeout Errors

Your Elasticsearch queries are timing out. Clients receive timeout errors, search operations fail, and user experience suffers. Let's diagnose why queries are slow and fix timeout issues.

Recognizing Search Timeout Errors

The error appears in responses:

json
{
  "error": {
    "root_cause": [
      {
        "type": "search_phase_execution_exception",
        "reason": "all shards failed"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query_fetch",
    "caused_by": {
      "type": "timeout_exception",
      "reason": "Timeout occurred while executing query"
    }
  },
  "status": 408
}

Or partial results with timeout indication:

json
{
  "took": 60000,
  "timed_out": true,
  "hits": {
    "total": {
      "value": 500,
      "relation": "gte"
    },
    "hits": [...]
  }
}

The timed_out: true field indicates the query hit the timeout limit.

Understanding Query Timeout

Elasticsearch queries have several timeout options:

Timeout TypePurposeDefault
Search timeoutOverall query limitNo limit
Shard timeoutPer-shard executionNo limit
Fetch timeoutFetch phase limitNo limit
Scroll timeoutScroll context expiry1 minute

Check current timeout settings:

bash
curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&flat_settings=true&pretty" | grep timeout

Diagnosing Slow Queries

Find Slow Queries

Enable slow query logging:

bash
curl -X PUT "localhost:9200/_all/_settings" -H 'Content-Type: application/json' -d'
{
  "index": {
    "search.slowlog.threshold.query.warn": "10s",
    "search.slowlog.threshold.query.info": "5s",
    "search.slowlog.threshold.query.debug": "2s",
    "search.slowlog.threshold.fetch.warn": "5s"
  }
}
'

Check the slow query log:

bash
grep "slowlog" /var/log/elasticsearch/elasticsearch.log
bash
[WARN ][i.s.slog                ] [node-1] [logs-2024-01][0] took[15.2s], took_millis[15200], total_hits[1000000], types[], stats[]

Profile the Query

Use query profiling to see where time is spent:

bash
curl -X GET "localhost:9200/logs/_search?profile=true" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "message": "error warning critical"
    }
  }
}
'
json
{
  "profile" : {
    "shards" : [
      {
        "id" : "[logs-2024-01][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "BooleanQuery",
                "description" : "message:error message:warning message:critical",
                "time_in_nanos" : 25000000000,
                "breakdown" : {
                  "create_weight" : 500000000,
                  "next_doc" : 15000000000,
                  "score" : 5000000000
                }
              }
            ]
          }
        ]
      }
    ]
  }
}

The breakdown shows where time is consumed.

Check Query Statistics

bash
curl -X GET "localhost:9200/_nodes/stats/indices/search?pretty"
json
{
  "indices" : {
    "search" : {
      "query_time_in_millis" : 120000,
      "query_total" : 1500,
      "fetch_time_in_millis" : 30000,
      "fetch_total" : 1500,
      "scroll_time_in_millis" : 500000,
      "scroll_total" : 50
    }
  }
}

Calculate averages:

  • Average query time: 120000 / 1500 = 80ms
  • Average fetch time: 30000 / 1500 = 20ms

Solution 1: Increase Timeout Values

For legitimate long-running queries, extend the timeout:

bash
curl -X GET "localhost:9200/logs/_search?timeout=60s" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "message": "error"
    }
  }
}
'

Set a cluster-level default:

bash
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "search.default_search_timeout": "30s",
    "search.default_keep_alive": "60s"
  }
}
'

Per-shard timeout:

bash
curl -X GET "localhost:9200/logs/_search?timeout=60s&shard_timeout=30s" -H 'Content-Type: application/json' -d'
{
  "query": {...}
}
'

Solution 2: Optimize Query Structure

Use Filter Instead of Query

Filters are faster and cacheable:

```bash # Before: slow scoring query curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d' { "query": { "match": { "status": "error" } } } '

# After: fast filter (no scoring) curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d' { "query": { "bool": { "filter": [ { "term": { "status.keyword": "error" } } ] } } } ' ```

Reduce Result Size

Limit returned fields and documents:

bash
curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
  "query": {...},
  "size": 50,
  "_source": ["timestamp", "message", "level"],
  "fields": ["timestamp", "message", "level"]
}
'

Avoid Deep Pagination

Don't use large from values:

bash
# Bad: deep pagination
curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
  "query": {...},
  "from": 10000,
  "size": 10
}
'

Use search_after for pagination:

bash
curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
  "query": {...},
  "size": 10,
  "sort": [
    { "timestamp": "desc" },
    { "_id": "asc" }
  ],
  "search_after": ["2024-01-15T10:30:00", "doc123"]
}
'

Simplify Aggregations

Large aggregations are slow:

bash
# Before: complex multi-level aggregation
curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "by_user": {
      "terms": { "field": "user_id", "size": 10000 },
      "aggs": {
        "by_date": {
          "date_histogram": { "field": "timestamp", "interval": "hour" },
          "aggs": {
            "avg_duration": { "avg": { "field": "duration" } }
          }
        }
      }
    }
  }
}
'

Reduce aggregation size:

bash
curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "by_user": {
      "terms": { "field": "user_id", "size": 100 }
    }
  }
}
'

Solution 3: Optimize Index Settings

Force Merge Segments

Fewer segments mean faster searches:

bash
curl -X POST "localhost:9200/logs-2024-01/_forcemerge?max_num_segments=1"

Only use on indices no longer receiving writes.

Increase Refresh Interval

Reduce refresh overhead for write-heavy indices:

bash
curl -X PUT "localhost:9200/logs/_settings" -H 'Content-Type: application/json' -d'
{
  "index": {
    "refresh_interval": "30s"
  }
}
'

Enable Query Cache

bash
curl -X PUT "localhost:9200/logs/_settings" -H 'Content-Type: application/json' -d'
{
  "index": {
    "queries.cache.size": "10%"
  }
}
'

Solution 4: Scale the Cluster

Add More Shards

Distribute query load:

```bash # Check current shard count curl -X GET "localhost:9200/logs/_settings?pretty" | grep number_of_shards

# For new indices, increase shards curl -X PUT "localhost:9200/logs-new" -H 'Content-Type: application/json' -d' { "settings": { "number_of_shards": 10 } } ' ```

More shards parallelize queries but increase overhead.

Add More Nodes

bash
curl -X GET "localhost:9200/_cat/nodes?v&h=name,cpu,heap.percent"

High CPU or heap indicates need for more capacity.

Use Warm Nodes

Move cold indices to warm nodes:

bash
curl -X PUT "localhost:9200/logs-2023-*/_settings" -H 'Content-Type: application/json' -d'
{
  "index.routing.allocation.require._tier_preference": "data_warm"
}
'

For long-running queries, use async search (ES 7.7+):

bash
curl -X POST "localhost:9200/logs/_async_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "message": "error"
    }
  },
  "keep_on_completion": true,
  "keep_alive": "1h"
}
'
json
{
  "id" : "FmRlE...",
  "is_running" : true,
  "expiration_time_in_millis" : 3600000
}

Retrieve results later:

bash
curl -X GET "localhost:9200/_async_search/FmRlE..."

Solution 6: Cancel Running Queries

Cancel queries taking too long:

bash
# List running tasks
curl -X GET "localhost:9200/_tasks?detailed=true&actions=*search*&pretty"
json
{
  "tasks" : {
    "node-1:12345" : {
      "action" : "indices:data/read/search",
      "description" : "indices[logs], types[], search_type[QUERY_THEN_FETCH]",
      "start_time_in_millis" : 1705305000000,
      "running_time_in_nanos" : 60000000000
    }
  }
}

Cancel a specific task:

bash
curl -X POST "localhost:9200/_tasks/node-1:12345/_cancel"

Cancel all search tasks:

bash
curl -X POST "localhost:9200/_tasks/_cancel?actions=*search*"

Monitoring Query Performance

Track query metrics:

bash
curl -X GET "localhost:9200/_nodes/stats/indices/search?filter_path=nodes.*.indices.search&pretty"

Set up monitoring:

bash
#!/bin/bash
while true; do
  echo "=== Search Stats ==="
  curl -s "localhost:9200/_nodes/stats/indices/search" | jq '
    .nodes | to_entries[] | {
      node: .value.name,
      query_total: .value.indices.search.query_total,
      query_time_ms: .value.indices.search.query_time_in_millis,
      avg_query_ms: (.value.indices.search.query_time_in_millis / .value.indices.search.query_total)
    }
  '
  echo ""
  sleep 60
done

Timeout Configuration Matrix

Workload TypeRecommended Timeout
Interactive UI5-10 seconds
API queries30 seconds
Analytics/reports60-300 seconds
Bulk exportsUse scroll or async
Real-time alerts2-5 seconds

Verification Steps

After optimization:

  1. 1.Run the previously slow query:
bash
curl -X GET "localhost:9200/logs/_search?profile=true" -H 'Content-Type: application/json' -d'
{ your query }
'
  1. 1.Check slow log for improvement:
bash
grep "slowlog" /var/log/elasticsearch/elasticsearch.log | tail -20
  1. 1.Monitor query latency:
bash
curl -X GET "localhost:9200/_nodes/stats/indices/search"
  1. 1.Test under load:
bash
# Run concurrent queries
for i in {1..50}; do
  curl -s "localhost:9200/logs/_search?q=error&size=100" > /dev/null &
done
wait

Summary

Search timeout issues are resolved by:

  1. 1.Setting appropriate timeout values for your workload
  2. 2.Optimizing query structure (filters, limited fields, search_after)
  3. 3.Reducing aggregation complexity
  4. 4.Optimizing index settings (segment merging, refresh interval)
  5. 5.Scaling the cluster (more shards, more nodes)
  6. 6.Using async search for long-running queries
  7. 7.Implementing query cancellation for runaway queries
  8. 8.Monitoring query performance continuously

Focus on query optimization before increasing timeouts. A fast query is better than a slow one with a longer timeout.