Fix Elasticsearch Search Timeout Errors - Query Performance Guide

# How to Fix Elasticsearch Search Timeout Errors

Your Elasticsearch queries are timing out. Clients receive timeout errors, search operations fail, and user experience suffers. Let's diagnose why queries are slow and fix timeout issues.

Recognizing Search Timeout Errors

The error appears in responses:

json

{
  "error": {
    "root_cause": [
      {
        "type": "search_phase_execution_exception",
        "reason": "all shards failed"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query_fetch",
    "caused_by": {
      "type": "timeout_exception",
      "reason": "Timeout occurred while executing query"
    }
  },
  "status": 408
}

Or partial results with timeout indication:

json

{
  "took": 60000,
  "timed_out": true,
  "hits": {
    "total": {
      "value": 500,
      "relation": "gte"
    },
    "hits": [...]
  }
}

The timed_out: true field indicates the query hit the timeout limit.

Understanding Query Timeout

Elasticsearch queries have several timeout options:

Timeout Type	Purpose	Default
Search timeout	Overall query limit	No limit
Shard timeout	Per-shard execution	No limit
Fetch timeout	Fetch phase limit	No limit
Scroll timeout	Scroll context expiry	1 minute

Check current timeout settings:

bash

curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&flat_settings=true&pretty" | grep timeout

Diagnosing Slow Queries

Find Slow Queries

Enable slow query logging:

bash

curl -X PUT "localhost:9200/_all/_settings" -H 'Content-Type: application/json' -d'
{
  "index": {
    "search.slowlog.threshold.query.warn": "10s",
    "search.slowlog.threshold.query.info": "5s",
    "search.slowlog.threshold.query.debug": "2s",
    "search.slowlog.threshold.fetch.warn": "5s"
  }
}
'

Check the slow query log:

bash

grep "slowlog" /var/log/elasticsearch/elasticsearch.log

bash

[WARN ][i.s.slog                ] [node-1] [logs-2024-01][0] took[15.2s], took_millis[15200], total_hits[1000000], types[], stats[]

Profile the Query

Use query profiling to see where time is spent:

bash

curl -X GET "localhost:9200/logs/_search?profile=true" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "message": "error warning critical"
    }
  }
}
'

json

{
  "profile" : {
    "shards" : [
      {
        "id" : "[logs-2024-01][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "BooleanQuery",
                "description" : "message:error message:warning message:critical",
                "time_in_nanos" : 25000000000,
                "breakdown" : {
                  "create_weight" : 500000000,
                  "next_doc" : 15000000000,
                  "score" : 5000000000
                }
              }
            ]
          }
        ]
      }
    ]
  }
}

The breakdown shows where time is consumed.

Check Query Statistics

bash

curl -X GET "localhost:9200/_nodes/stats/indices/search?pretty"

json

{
  "indices" : {
    "search" : {
      "query_time_in_millis" : 120000,
      "query_total" : 1500,
      "fetch_time_in_millis" : 30000,
      "fetch_total" : 1500,
      "scroll_time_in_millis" : 500000,
      "scroll_total" : 50
    }
  }
}

Calculate averages:

Average query time: 120000 / 1500 = 80ms
Average fetch time: 30000 / 1500 = 20ms

Solution 1: Increase Timeout Values

For legitimate long-running queries, extend the timeout:

bash

curl -X GET "localhost:9200/logs/_search?timeout=60s" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "message": "error"
    }
  }
}
'

Set a cluster-level default:

bash

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "search.default_search_timeout": "30s",
    "search.default_keep_alive": "60s"
  }
}
'

Per-shard timeout:

bash

curl -X GET "localhost:9200/logs/_search?timeout=60s&shard_timeout=30s" -H 'Content-Type: application/json' -d'
{
  "query": {...}
}
'

Solution 2: Optimize Query Structure

Use Filter Instead of Query

Filters are faster and cacheable:

```bash # Before: slow scoring query curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d' { "query": { "match": { "status": "error" } } } '

# After: fast filter (no scoring) curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d' { "query": { "bool": { "filter": [ { "term": { "status.keyword": "error" } } ] } } } ' ```

Reduce Result Size

Limit returned fields and documents:

bash

curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
  "query": {...},
  "size": 50,
  "_source": ["timestamp", "message", "level"],
  "fields": ["timestamp", "message", "level"]
}
'

Avoid Deep Pagination

Don't use large from values:

bash

# Bad: deep pagination
curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
  "query": {...},
  "from": 10000,
  "size": 10
}
'

Use search_after for pagination:

bash

curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
  "query": {...},
  "size": 10,
  "sort": [
    { "timestamp": "desc" },
    { "_id": "asc" }
  ],
  "search_after": ["2024-01-15T10:30:00", "doc123"]
}
'

Simplify Aggregations

Large aggregations are slow:

bash

# Before: complex multi-level aggregation
curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "by_user": {
      "terms": { "field": "user_id", "size": 10000 },
      "aggs": {
        "by_date": {
          "date_histogram": { "field": "timestamp", "interval": "hour" },
          "aggs": {
            "avg_duration": { "avg": { "field": "duration" } }
          }
        }
      }
    }
  }
}
'

Reduce aggregation size:

bash

curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "by_user": {
      "terms": { "field": "user_id", "size": 100 }
    }
  }
}
'

Solution 3: Optimize Index Settings

Force Merge Segments

Fewer segments mean faster searches:

bash

curl -X POST "localhost:9200/logs-2024-01/_forcemerge?max_num_segments=1"

Only use on indices no longer receiving writes.

Increase Refresh Interval

Reduce refresh overhead for write-heavy indices:

bash

curl -X PUT "localhost:9200/logs/_settings" -H 'Content-Type: application/json' -d'
{
  "index": {
    "refresh_interval": "30s"
  }
}
'

Enable Query Cache

bash

curl -X PUT "localhost:9200/logs/_settings" -H 'Content-Type: application/json' -d'
{
  "index": {
    "queries.cache.size": "10%"
  }
}
'

Solution 4: Scale the Cluster

Add More Shards

Distribute query load:

```bash # Check current shard count curl -X GET "localhost:9200/logs/_settings?pretty" | grep number_of_shards

# For new indices, increase shards curl -X PUT "localhost:9200/logs-new" -H 'Content-Type: application/json' -d' { "settings": { "number_of_shards": 10 } } ' ```

More shards parallelize queries but increase overhead.

Add More Nodes

bash

curl -X GET "localhost:9200/_cat/nodes?v&h=name,cpu,heap.percent"

High CPU or heap indicates need for more capacity.

Use Warm Nodes

Move cold indices to warm nodes:

bash

curl -X PUT "localhost:9200/logs-2023-*/_settings" -H 'Content-Type: application/json' -d'
{
  "index.routing.allocation.require._tier_preference": "data_warm"
}
'

Solution 5: Use Async Search

For long-running queries, use async search (ES 7.7+):

bash

curl -X POST "localhost:9200/logs/_async_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "message": "error"
    }
  },
  "keep_on_completion": true,
  "keep_alive": "1h"
}
'

json

{
  "id" : "FmRlE...",
  "is_running" : true,
  "expiration_time_in_millis" : 3600000
}

Retrieve results later:

bash

curl -X GET "localhost:9200/_async_search/FmRlE..."

Solution 6: Cancel Running Queries

Cancel queries taking too long:

bash

# List running tasks
curl -X GET "localhost:9200/_tasks?detailed=true&actions=*search*&pretty"

json

{
  "tasks" : {
    "node-1:12345" : {
      "action" : "indices:data/read/search",
      "description" : "indices[logs], types[], search_type[QUERY_THEN_FETCH]",
      "start_time_in_millis" : 1705305000000,
      "running_time_in_nanos" : 60000000000
    }
  }
}

Cancel a specific task:

bash

curl -X POST "localhost:9200/_tasks/node-1:12345/_cancel"

Cancel all search tasks:

bash

curl -X POST "localhost:9200/_tasks/_cancel?actions=*search*"

Monitoring Query Performance

Track query metrics:

bash

curl -X GET "localhost:9200/_nodes/stats/indices/search?filter_path=nodes.*.indices.search&pretty"

Set up monitoring:

bash

#!/bin/bash
while true; do
  echo "=== Search Stats ==="
  curl -s "localhost:9200/_nodes/stats/indices/search" | jq '
    .nodes | to_entries[] | {
      node: .value.name,
      query_total: .value.indices.search.query_total,
      query_time_ms: .value.indices.search.query_time_in_millis,
      avg_query_ms: (.value.indices.search.query_time_in_millis / .value.indices.search.query_total)
    }
  '
  echo ""
  sleep 60
done

Timeout Configuration Matrix

Workload Type	Recommended Timeout
Interactive UI	5-10 seconds
API queries	30 seconds
Analytics/reports	60-300 seconds
Bulk exports	Use scroll or async
Real-time alerts	2-5 seconds

Verification Steps

After optimization:

1.Run the previously slow query:

bash

curl -X GET "localhost:9200/logs/_search?profile=true" -H 'Content-Type: application/json' -d'
{ your query }
'

1.Check slow log for improvement:

bash

grep "slowlog" /var/log/elasticsearch/elasticsearch.log | tail -20

1.Monitor query latency:

bash

curl -X GET "localhost:9200/_nodes/stats/indices/search"

1.Test under load:

bash

# Run concurrent queries
for i in {1..50}; do
  curl -s "localhost:9200/logs/_search?q=error&size=100" > /dev/null &
done
wait

Summary

Search timeout issues are resolved by:

1.Setting appropriate timeout values for your workload
2.Optimizing query structure (filters, limited fields, search_after)
3.Reducing aggregation complexity
4.Optimizing index settings (segment merging, refresh interval)
5.Scaling the cluster (more shards, more nodes)
6.Using async search for long-running queries
7.Implementing query cancellation for runaway queries
8.Monitoring query performance continuously

Focus on query optimization before increasing timeouts. A fast query is better than a slow one with a longer timeout.

How to Fix Elasticsearch Search Timeout Errors

Recognizing Search Timeout Errors

Understanding Query Timeout

Diagnosing Slow Queries

Find Slow Queries

Profile the Query

Check Query Statistics

Solution 1: Increase Timeout Values

Solution 2: Optimize Query Structure

Use Filter Instead of Query

Reduce Result Size

Avoid Deep Pagination

Simplify Aggregations

Solution 3: Optimize Index Settings

Force Merge Segments

Increase Refresh Interval

Enable Query Cache

Solution 4: Scale the Cluster

Add More Shards

Add More Nodes

Use Warm Nodes

Solution 5: Use Async Search

Solution 6: Cancel Running Queries

Monitoring Query Performance

Timeout Configuration Matrix

Verification Steps

Summary

Share this guide

More Monitoring Troubleshooting Guides

Metric Retention Expired

Timeseries Storage Full

Collector Agent Crashed

Webhook Notification Timeout

SMS Notification Failed

Email Notification Bounced