What's Actually Happening

Grafana Loki queries timeout when querying large time ranges or high-cardinality log data. Log exploration in Grafana fails with timeout errors.

The Error You'll See

Query timeout:

json
{
  "status": "failed",
  "error": "query execution: context deadline exceeded"
}

Grafana error:

bash
Error: failed to query logs: context deadline exceeded

Loki API error:

```bash $ curl "http://loki:3100/loki/api/v1/query_range?query={app=\"myapp\"}&start=now-24h"

{ "status": "error", "error": "timeout: query execution took longer than 30s" } ```

Why This Happens

  1. 1.Large time range - Querying days/weeks of logs
  2. 2.High cardinality - Too many label combinations
  3. 3.Insufficient resources - Loki components under-provisioned
  4. 4.Slow storage - Object store latency
  5. 5.Query complexity - Regex or complex LogQL
  6. 6.No query limits - Limits not configured properly

Step 1: Check Loki Status

```bash # Check Loki running: systemctl status loki

# Check Loki metrics: curl http://localhost:3100/metrics | grep -E "loki_(query|ingester|distributor)"

# Check Loki config: curl http://localhost:3100/config | jq

# Check Loki build info: curl http://localhost:3100/ready

# Check ingester status: curl http://localhost:3100/ingester/ring

# Check querier status: curl http://localhost:3100/querier/ring ```

Step 2: Check Query Limits

```bash # Check current limits: curl http://localhost:3100/config | jq '.limits'

# Key limits: # max_query_length: 721h (30 days) # max_query_parallelism: 32 # max_entries_limit_per_query: 5000 # max_streams_per_user: 10000

# In loki-config.yaml: limits_config: max_query_length: 168h # Max time range: 7 days max_query_parallelism: 32 # Parallel query workers max_entries_limit_per_query: 5000 # Max log lines max_streams_per_user: 10000 # Max unique streams max_line_size: 256kb # Max line size reject_old_samples: true reject_old_samples_max_age: 168h creation_grace_period: 10m

# Query timeout query_timeout: 1m

# Cardinality limit max_streams_matchers_per_query: 1000

# Restart Loki: systemctl restart loki ```

Step 3: Adjust Timeout Settings

```yaml # In loki-config.yaml: limits_config: # Increase query timeout query_timeout: 5m

# Increase query length max_query_length: 720h

# For Loki SimpleScalable mode: query_range: align_queries_with_step: true max_retries: 5 parallelism: 32 cache_results: true results_cache: cache: memcached_client: addresses: dns+memcached:11211

# Frontend worker config: frontend: max_outstanding_per_tenant: 100 log_queries_longer_than: 10s compress_responses: true

frontend_worker: frontend_address: 127.0.0.1:9095 grpc_client_config: max_recv_msg_size: 104857600

# Restart Loki: systemctl restart loki ```

Step 4: Optimize LogQL Queries

```bash # Slow query (full scan): {app="myapp"}

# Faster query (add filters): {app="myapp"} |= "error"

# Use label filters (indexed): {app="myapp", level="error"}

# Avoid regex when possible: # Slow: {app="myapp"} |~ "error.*timeout"

# Fast: {app="myapp"} |= "error" |= "timeout"

# Limit time range: {app="myapp"} [5m] # Last 5 minutes only

# Use count for aggregation: sum(count_over_time({app="myapp"} [5m]))

# Instead of listing all logs ```

Step 5: Check Cardinality

```bash # Check label cardinality: curl "http://localhost:3100/loki/api/v1/label" | jq

# Check label values count: curl "http://localhost:3100/loki/api/v1/label/hostname/values" | jq '.data | length'

# High cardinality labels cause slow queries # Avoid: hostname, request_id, session_id as labels

# Check active streams: curl "http://localhost:3100/loki/api/v1/tail" -d '{"query": "sum(count_over_time({job=~\".+\"}[5m]))"}'

# Limit cardinality in config: limits_config: max_streams_per_user: 10000 max_global_streams_per_user: 50000 per_stream_rate_limit: 10MB per_stream_rate_limit_burst: 20MB

# Use structured metadata instead of labels for high cardinality: # In promtail: pipeline_stages: - labels: level: app: - structured_metadata: request_id: trace_id: ```

Step 6: Enable Query Caching

```yaml # In loki-config.yaml: query_range: cache_results: true results_cache: cache: memcached_client: addresses: dns+memcached:11211 timeout: 500ms

chunk_store_config: chunk_cache_config: memcached_client: addresses: dns+memcached:11211 timeout: 500ms

# Index cache: index_gateway: mode: simple

schema_config: configs: - from: 2024-01-01 store: tsdb object_store: s3 schema: v13 index: prefix: loki_index_ period: 24h

# Result cache configuration: results_cache: cache: memcached_client: addresses: dns+memcached:11211

# Start memcached: docker run -d --name memcached -p 11211:11211 memcached:latest ```

Step 7: Check Loki Resources

```bash # Check Loki memory: ps aux | grep loki

# Check JVM (if using Java) or Go memory: curl http://localhost:3100/metrics | grep go_memstats

# Check disk cache: df -h /var/lib/loki/chunks

# For Kubernetes: kubectl top pods -n loki

# Increase resources: # For ingester: resources: limits: memory: 16Gi cpu: 4 requests: memory: 8Gi cpu: 2

# For querier: resources: limits: memory: 8Gi cpu: 2

# For query-frontend: resources: limits: memory: 4Gi cpu: 2 ```

Step 8: Scale Loki Components

```yaml # Loki SimpleScalable mode deployment:

apiVersion: apps/v1 kind: Deployment metadata: name: loki-read spec: replicas: 3 # Scale queriers template: spec: containers: - name: loki args: - -target=read resources: limits: memory: 8Gi --- apiVersion: apps/v1 kind: Deployment metadata: name: loki-write spec: replicas: 3 # Scale ingesters template: spec: containers: - name: loki args: - -target=write resources: limits: memory: 16Gi --- apiVersion: apps/v1 kind: Deployment metadata: name: loki-backend spec: replicas: 2 # Query frontend template: spec: containers: - name: loki args: - -target=backend

# Scale based on load: kubectl scale deployment loki-read --replicas=5 ```

Step 9: Check Storage Performance

```bash # For S3 storage: aws s3api head-bucket --bucket my-loki-bucket

# Check S3 latency: aws s3 ls s3://my-loki-bucket/loki/chunks/ --recursive | head

# For GCS: gsutil ls gs://my-loki-bucket/loki/

# Check storage config: curl http://localhost:3100/config | jq '.storage_config'

# Optimize storage: storage_config: aws: s3: s3://region/bucket-name region: us-east-1 sse_encryption: true boltdb_shipper: active_index_directory: /var/lib/loki/index cache_location: /var/lib/loki/cache cache_ttl: 24h

# Chunk cache: chunk_store_config: max_look_back_period: 0s chunk_cache_config: memcached_client: addresses: dns+memcached:11211 ```

Step 10: Monitor Query Performance

```bash # Create monitoring script: cat << 'EOF' > /usr/local/bin/monitor-loki.sh #!/bin/bash

echo "=== Loki Query Stats ===" curl -s http://localhost:3100/metrics | grep -E "loki_query_(duration|total)"

echo "" echo "=== Loki Ingestion Rate ===" curl -s http://localhost:3100/metrics | grep loki_distributor_lines_received_total

echo "" echo "=== Active Streams ===" curl -s http://localhost:3100/metrics | grep loki_ingester_memory_streams

echo "" echo "=== Cache Hit Rate ===" curl -s http://localhost:3100/metrics | grep -E "loki_cache_(hits|misses)"

echo "" echo "=== Chunk Store ===" curl -s http://localhost:3100/metrics | grep loki_chunk_store_index_lookups EOF

chmod +x /usr/local/bin/monitor-loki.sh

# Prometheus alerts: - alert: LokiQuerySlow expr: histogram_quantile(0.95, rate(loki_query_duration_seconds_bucket[5m])) > 10 for: 5m labels: severity: warning annotations: summary: "Loki queries are slow (>10s at 95th percentile)"

  • alert: LokiQueryTimeout
  • expr: rate(loki_query_failures_total{reason="timeout"}[5m]) > 0
  • for: 2m
  • labels:
  • severity: critical
  • annotations:
  • summary: "Loki query timeouts detected"
  • `

Loki Query Timeout Checklist

CheckCommandExpected
Query timeoutconfigAdequate
Time rangequery paramWithin limits
Cardinalitylabel valuesLow
Cache enabledconfigYes
Querier countreplicasSufficient
Storage latencymetricsLow

Verify the Fix

```bash # After optimizing Loki

# 1. Test query curl "http://localhost:3100/loki/api/v1/query_range?query={app=\"myapp\"}&start=now-1h&end=now" // Returns results within timeout

# 2. Check query duration curl http://localhost:3100/metrics | grep loki_query_duration_seconds // P95 < 5 seconds

# 3. Test in Grafana # Open Explore > Logs // Queries complete successfully

# 4. Check cache hit rate curl http://localhost:3100/metrics | grep cache_hits // > 50% hit rate

# 5. Monitor under load # Run multiple queries // All complete

# 6. Check logs journalctl -u loki | grep timeout // No timeout errors ```

  • [Fix Prometheus Query Timeout](/articles/fix-prometheus-query-timeout)
  • [Fix Grafana Panel Rendering Timeout](/articles/fix-grafana-panel-rendering-timeout)
  • [Fix Tempo Trace Not Found](/articles/fix-tempo-trace-not-found)