What's Actually Happening
Grafana Loki queries timeout when querying large time ranges or high-cardinality log data. Log exploration in Grafana fails with timeout errors.
The Error You'll See
Query timeout:
{
"status": "failed",
"error": "query execution: context deadline exceeded"
}Grafana error:
Error: failed to query logs: context deadline exceededLoki API error:
```bash $ curl "http://loki:3100/loki/api/v1/query_range?query={app=\"myapp\"}&start=now-24h"
{ "status": "error", "error": "timeout: query execution took longer than 30s" } ```
Why This Happens
- 1.Large time range - Querying days/weeks of logs
- 2.High cardinality - Too many label combinations
- 3.Insufficient resources - Loki components under-provisioned
- 4.Slow storage - Object store latency
- 5.Query complexity - Regex or complex LogQL
- 6.No query limits - Limits not configured properly
Step 1: Check Loki Status
```bash # Check Loki running: systemctl status loki
# Check Loki metrics: curl http://localhost:3100/metrics | grep -E "loki_(query|ingester|distributor)"
# Check Loki config: curl http://localhost:3100/config | jq
# Check Loki build info: curl http://localhost:3100/ready
# Check ingester status: curl http://localhost:3100/ingester/ring
# Check querier status: curl http://localhost:3100/querier/ring ```
Step 2: Check Query Limits
```bash # Check current limits: curl http://localhost:3100/config | jq '.limits'
# Key limits: # max_query_length: 721h (30 days) # max_query_parallelism: 32 # max_entries_limit_per_query: 5000 # max_streams_per_user: 10000
# In loki-config.yaml: limits_config: max_query_length: 168h # Max time range: 7 days max_query_parallelism: 32 # Parallel query workers max_entries_limit_per_query: 5000 # Max log lines max_streams_per_user: 10000 # Max unique streams max_line_size: 256kb # Max line size reject_old_samples: true reject_old_samples_max_age: 168h creation_grace_period: 10m
# Query timeout query_timeout: 1m
# Cardinality limit max_streams_matchers_per_query: 1000
# Restart Loki: systemctl restart loki ```
Step 3: Adjust Timeout Settings
```yaml # In loki-config.yaml: limits_config: # Increase query timeout query_timeout: 5m
# Increase query length max_query_length: 720h
# For Loki SimpleScalable mode: query_range: align_queries_with_step: true max_retries: 5 parallelism: 32 cache_results: true results_cache: cache: memcached_client: addresses: dns+memcached:11211
# Frontend worker config: frontend: max_outstanding_per_tenant: 100 log_queries_longer_than: 10s compress_responses: true
frontend_worker: frontend_address: 127.0.0.1:9095 grpc_client_config: max_recv_msg_size: 104857600
# Restart Loki: systemctl restart loki ```
Step 4: Optimize LogQL Queries
```bash # Slow query (full scan): {app="myapp"}
# Faster query (add filters): {app="myapp"} |= "error"
# Use label filters (indexed): {app="myapp", level="error"}
# Avoid regex when possible: # Slow: {app="myapp"} |~ "error.*timeout"
# Fast: {app="myapp"} |= "error" |= "timeout"
# Limit time range: {app="myapp"} [5m] # Last 5 minutes only
# Use count for aggregation: sum(count_over_time({app="myapp"} [5m]))
# Instead of listing all logs ```
Step 5: Check Cardinality
```bash # Check label cardinality: curl "http://localhost:3100/loki/api/v1/label" | jq
# Check label values count: curl "http://localhost:3100/loki/api/v1/label/hostname/values" | jq '.data | length'
# High cardinality labels cause slow queries # Avoid: hostname, request_id, session_id as labels
# Check active streams: curl "http://localhost:3100/loki/api/v1/tail" -d '{"query": "sum(count_over_time({job=~\".+\"}[5m]))"}'
# Limit cardinality in config: limits_config: max_streams_per_user: 10000 max_global_streams_per_user: 50000 per_stream_rate_limit: 10MB per_stream_rate_limit_burst: 20MB
# Use structured metadata instead of labels for high cardinality: # In promtail: pipeline_stages: - labels: level: app: - structured_metadata: request_id: trace_id: ```
Step 6: Enable Query Caching
```yaml # In loki-config.yaml: query_range: cache_results: true results_cache: cache: memcached_client: addresses: dns+memcached:11211 timeout: 500ms
chunk_store_config: chunk_cache_config: memcached_client: addresses: dns+memcached:11211 timeout: 500ms
# Index cache: index_gateway: mode: simple
schema_config: configs: - from: 2024-01-01 store: tsdb object_store: s3 schema: v13 index: prefix: loki_index_ period: 24h
# Result cache configuration: results_cache: cache: memcached_client: addresses: dns+memcached:11211
# Start memcached: docker run -d --name memcached -p 11211:11211 memcached:latest ```
Step 7: Check Loki Resources
```bash # Check Loki memory: ps aux | grep loki
# Check JVM (if using Java) or Go memory: curl http://localhost:3100/metrics | grep go_memstats
# Check disk cache: df -h /var/lib/loki/chunks
# For Kubernetes: kubectl top pods -n loki
# Increase resources: # For ingester: resources: limits: memory: 16Gi cpu: 4 requests: memory: 8Gi cpu: 2
# For querier: resources: limits: memory: 8Gi cpu: 2
# For query-frontend: resources: limits: memory: 4Gi cpu: 2 ```
Step 8: Scale Loki Components
```yaml # Loki SimpleScalable mode deployment:
apiVersion: apps/v1 kind: Deployment metadata: name: loki-read spec: replicas: 3 # Scale queriers template: spec: containers: - name: loki args: - -target=read resources: limits: memory: 8Gi --- apiVersion: apps/v1 kind: Deployment metadata: name: loki-write spec: replicas: 3 # Scale ingesters template: spec: containers: - name: loki args: - -target=write resources: limits: memory: 16Gi --- apiVersion: apps/v1 kind: Deployment metadata: name: loki-backend spec: replicas: 2 # Query frontend template: spec: containers: - name: loki args: - -target=backend
# Scale based on load: kubectl scale deployment loki-read --replicas=5 ```
Step 9: Check Storage Performance
```bash # For S3 storage: aws s3api head-bucket --bucket my-loki-bucket
# Check S3 latency: aws s3 ls s3://my-loki-bucket/loki/chunks/ --recursive | head
# For GCS: gsutil ls gs://my-loki-bucket/loki/
# Check storage config: curl http://localhost:3100/config | jq '.storage_config'
# Optimize storage: storage_config: aws: s3: s3://region/bucket-name region: us-east-1 sse_encryption: true boltdb_shipper: active_index_directory: /var/lib/loki/index cache_location: /var/lib/loki/cache cache_ttl: 24h
# Chunk cache: chunk_store_config: max_look_back_period: 0s chunk_cache_config: memcached_client: addresses: dns+memcached:11211 ```
Step 10: Monitor Query Performance
```bash # Create monitoring script: cat << 'EOF' > /usr/local/bin/monitor-loki.sh #!/bin/bash
echo "=== Loki Query Stats ===" curl -s http://localhost:3100/metrics | grep -E "loki_query_(duration|total)"
echo "" echo "=== Loki Ingestion Rate ===" curl -s http://localhost:3100/metrics | grep loki_distributor_lines_received_total
echo "" echo "=== Active Streams ===" curl -s http://localhost:3100/metrics | grep loki_ingester_memory_streams
echo "" echo "=== Cache Hit Rate ===" curl -s http://localhost:3100/metrics | grep -E "loki_cache_(hits|misses)"
echo "" echo "=== Chunk Store ===" curl -s http://localhost:3100/metrics | grep loki_chunk_store_index_lookups EOF
chmod +x /usr/local/bin/monitor-loki.sh
# Prometheus alerts: - alert: LokiQuerySlow expr: histogram_quantile(0.95, rate(loki_query_duration_seconds_bucket[5m])) > 10 for: 5m labels: severity: warning annotations: summary: "Loki queries are slow (>10s at 95th percentile)"
- alert: LokiQueryTimeout
- expr: rate(loki_query_failures_total{reason="timeout"}[5m]) > 0
- for: 2m
- labels:
- severity: critical
- annotations:
- summary: "Loki query timeouts detected"
`
Loki Query Timeout Checklist
| Check | Command | Expected |
|---|---|---|
| Query timeout | config | Adequate |
| Time range | query param | Within limits |
| Cardinality | label values | Low |
| Cache enabled | config | Yes |
| Querier count | replicas | Sufficient |
| Storage latency | metrics | Low |
Verify the Fix
```bash # After optimizing Loki
# 1. Test query curl "http://localhost:3100/loki/api/v1/query_range?query={app=\"myapp\"}&start=now-1h&end=now" // Returns results within timeout
# 2. Check query duration curl http://localhost:3100/metrics | grep loki_query_duration_seconds // P95 < 5 seconds
# 3. Test in Grafana # Open Explore > Logs // Queries complete successfully
# 4. Check cache hit rate curl http://localhost:3100/metrics | grep cache_hits // > 50% hit rate
# 5. Monitor under load # Run multiple queries // All complete
# 6. Check logs journalctl -u loki | grep timeout // No timeout errors ```
Related Issues
- [Fix Prometheus Query Timeout](/articles/fix-prometheus-query-timeout)
- [Fix Grafana Panel Rendering Timeout](/articles/fix-grafana-panel-rendering-timeout)
- [Fix Tempo Trace Not Found](/articles/fix-tempo-trace-not-found)