Fix Loki Query Timeout

What's Actually Happening

Grafana Loki queries timeout when querying large time ranges or high-cardinality log data. Log exploration in Grafana fails with timeout errors.

The Error You'll See

Query timeout:

json

{
  "status": "failed",
  "error": "query execution: context deadline exceeded"
}

Grafana error:

bash

Error: failed to query logs: context deadline exceeded

Loki API error:

```bash $ curl "http://loki:3100/loki/api/v1/query_range?query={app=\"myapp\"}&start=now-24h"

{ "status": "error", "error": "timeout: query execution took longer than 30s" } ```

Why This Happens

1.Large time range - Querying days/weeks of logs
2.High cardinality - Too many label combinations
3.Insufficient resources - Loki components under-provisioned
4.Slow storage - Object store latency
5.Query complexity - Regex or complex LogQL
6.No query limits - Limits not configured properly

Step 1: Check Loki Status

```bash # Check Loki running: systemctl status loki

# Check Loki metrics: curl http://localhost:3100/metrics | grep -E "loki_(query|ingester|distributor)"

# Check Loki config: curl http://localhost:3100/config | jq

# Check Loki build info: curl http://localhost:3100/ready

# Check ingester status: curl http://localhost:3100/ingester/ring

# Check querier status: curl http://localhost:3100/querier/ring ```

Step 2: Check Query Limits

```bash # Check current limits: curl http://localhost:3100/config | jq '.limits'

# Key limits: # max_query_length: 721h (30 days) # max_query_parallelism: 32 # max_entries_limit_per_query: 5000 # max_streams_per_user: 10000

# In loki-config.yaml: limits_config: max_query_length: 168h # Max time range: 7 days max_query_parallelism: 32 # Parallel query workers max_entries_limit_per_query: 5000 # Max log lines max_streams_per_user: 10000 # Max unique streams max_line_size: 256kb # Max line size reject_old_samples: true reject_old_samples_max_age: 168h creation_grace_period: 10m

# Query timeout query_timeout: 1m

# Cardinality limit max_streams_matchers_per_query: 1000

# Restart Loki: systemctl restart loki ```

Step 3: Adjust Timeout Settings

```yaml # In loki-config.yaml: limits_config: # Increase query timeout query_timeout: 5m

# Increase query length max_query_length: 720h

# For Loki SimpleScalable mode: query_range: align_queries_with_step: true max_retries: 5 parallelism: 32 cache_results: true results_cache: cache: memcached_client: addresses: dns+memcached:11211

# Frontend worker config: frontend: max_outstanding_per_tenant: 100 log_queries_longer_than: 10s compress_responses: true

frontend_worker: frontend_address: 127.0.0.1:9095 grpc_client_config: max_recv_msg_size: 104857600

# Restart Loki: systemctl restart loki ```

Step 4: Optimize LogQL Queries

```bash # Slow query (full scan): {app="myapp"}

# Faster query (add filters): {app="myapp"} |= "error"

# Use label filters (indexed): {app="myapp", level="error"}

# Avoid regex when possible: # Slow: {app="myapp"} |~ "error.*timeout"

# Fast: {app="myapp"} |= "error" |= "timeout"

# Limit time range: {app="myapp"} [5m] # Last 5 minutes only

# Use count for aggregation: sum(count_over_time({app="myapp"} [5m]))

# Instead of listing all logs ```

Step 5: Check Cardinality

```bash # Check label cardinality: curl "http://localhost:3100/loki/api/v1/label" | jq

# Check label values count: curl "http://localhost:3100/loki/api/v1/label/hostname/values" | jq '.data | length'

# High cardinality labels cause slow queries # Avoid: hostname, request_id, session_id as labels

# Check active streams: curl "http://localhost:3100/loki/api/v1/tail" -d '{"query": "sum(count_over_time({job=~\".+\"}[5m]))"}'

# Limit cardinality in config: limits_config: max_streams_per_user: 10000 max_global_streams_per_user: 50000 per_stream_rate_limit: 10MB per_stream_rate_limit_burst: 20MB

# Use structured metadata instead of labels for high cardinality: # In promtail: pipeline_stages: - labels: level: app: - structured_metadata: request_id: trace_id: ```

Step 6: Enable Query Caching

```yaml # In loki-config.yaml: query_range: cache_results: true results_cache: cache: memcached_client: addresses: dns+memcached:11211 timeout: 500ms

chunk_store_config: chunk_cache_config: memcached_client: addresses: dns+memcached:11211 timeout: 500ms

# Index cache: index_gateway: mode: simple

schema_config: configs: - from: 2024-01-01 store: tsdb object_store: s3 schema: v13 index: prefix: loki_index_ period: 24h

# Result cache configuration: results_cache: cache: memcached_client: addresses: dns+memcached:11211

# Start memcached: docker run -d --name memcached -p 11211:11211 memcached:latest ```

Step 7: Check Loki Resources

```bash # Check Loki memory: ps aux | grep loki

# Check JVM (if using Java) or Go memory: curl http://localhost:3100/metrics | grep go_memstats

# Check disk cache: df -h /var/lib/loki/chunks

# For Kubernetes: kubectl top pods -n loki

# Increase resources: # For ingester: resources: limits: memory: 16Gi cpu: 4 requests: memory: 8Gi cpu: 2

# For querier: resources: limits: memory: 8Gi cpu: 2

# For query-frontend: resources: limits: memory: 4Gi cpu: 2 ```

Step 8: Scale Loki Components

```yaml # Loki SimpleScalable mode deployment:

apiVersion: apps/v1 kind: Deployment metadata: name: loki-read spec: replicas: 3 # Scale queriers template: spec: containers: - name: loki args: - -target=read resources: limits: memory: 8Gi --- apiVersion: apps/v1 kind: Deployment metadata: name: loki-write spec: replicas: 3 # Scale ingesters template: spec: containers: - name: loki args: - -target=write resources: limits: memory: 16Gi --- apiVersion: apps/v1 kind: Deployment metadata: name: loki-backend spec: replicas: 2 # Query frontend template: spec: containers: - name: loki args: - -target=backend

# Scale based on load: kubectl scale deployment loki-read --replicas=5 ```

Step 9: Check Storage Performance

```bash # For S3 storage: aws s3api head-bucket --bucket my-loki-bucket

# Check S3 latency: aws s3 ls s3://my-loki-bucket/loki/chunks/ --recursive | head

# For GCS: gsutil ls gs://my-loki-bucket/loki/

# Check storage config: curl http://localhost:3100/config | jq '.storage_config'

# Optimize storage: storage_config: aws: s3: s3://region/bucket-name region: us-east-1 sse_encryption: true boltdb_shipper: active_index_directory: /var/lib/loki/index cache_location: /var/lib/loki/cache cache_ttl: 24h

# Chunk cache: chunk_store_config: max_look_back_period: 0s chunk_cache_config: memcached_client: addresses: dns+memcached:11211 ```

Step 10: Monitor Query Performance

```bash # Create monitoring script: cat << 'EOF' > /usr/local/bin/monitor-loki.sh #!/bin/bash

echo "=== Loki Query Stats ===" curl -s http://localhost:3100/metrics | grep -E "loki_query_(duration|total)"

echo "" echo "=== Loki Ingestion Rate ===" curl -s http://localhost:3100/metrics | grep loki_distributor_lines_received_total

echo "" echo "=== Active Streams ===" curl -s http://localhost:3100/metrics | grep loki_ingester_memory_streams

echo "" echo "=== Cache Hit Rate ===" curl -s http://localhost:3100/metrics | grep -E "loki_cache_(hits|misses)"

echo "" echo "=== Chunk Store ===" curl -s http://localhost:3100/metrics | grep loki_chunk_store_index_lookups EOF

chmod +x /usr/local/bin/monitor-loki.sh

# Prometheus alerts: - alert: LokiQuerySlow expr: histogram_quantile(0.95, rate(loki_query_duration_seconds_bucket[5m])) > 10 for: 5m labels: severity: warning annotations: summary: "Loki queries are slow (>10s at 95th percentile)"

alert: LokiQueryTimeout
expr: rate(loki_query_failures_total{reason="timeout"}[5m]) > 0
for: 2m
labels:
severity: critical
annotations:
summary: "Loki query timeouts detected"
`

Loki Query Timeout Checklist

Check	Command	Expected
Query timeout	config	Adequate
Time range	query param	Within limits
Cardinality	label values	Low
Cache enabled	config	Yes
Querier count	replicas	Sufficient
Storage latency	metrics	Low

Verify the Fix

```bash # After optimizing Loki

# 1. Test query curl "http://localhost:3100/loki/api/v1/query_range?query={app=\"myapp\"}&start=now-1h&end=now" // Returns results within timeout

# 2. Check query duration curl http://localhost:3100/metrics | grep loki_query_duration_seconds // P95 < 5 seconds

# 3. Test in Grafana # Open Explore > Logs // Queries complete successfully

# 4. Check cache hit rate curl http://localhost:3100/metrics | grep cache_hits // > 50% hit rate

# 5. Monitor under load # Run multiple queries // All complete

# 6. Check logs journalctl -u loki | grep timeout // No timeout errors ```

[Fix Prometheus Query Timeout](/articles/fix-prometheus-query-timeout)
[Fix Grafana Panel Rendering Timeout](/articles/fix-grafana-panel-rendering-timeout)
[Fix Tempo Trace Not Found](/articles/fix-tempo-trace-not-found)

What's Actually Happening

The Error You'll See

Why This Happens

Step 1: Check Loki Status

Step 2: Check Query Limits

Step 3: Adjust Timeout Settings

Step 4: Optimize LogQL Queries

Step 5: Check Cardinality

Step 6: Enable Query Caching

Step 7: Check Loki Resources

Step 8: Scale Loki Components

Step 9: Check Storage Performance

Step 10: Monitor Query Performance

Loki Query Timeout Checklist

Verify the Fix

Related Issues

Share this guide

More Monitoring Troubleshooting Guides

Metric Retention Expired

Timeseries Storage Full

Collector Agent Crashed

Webhook Notification Timeout

SMS Notification Failed

Fix Fluentd Log Not Sending