The Problem
Prometheus is failing to query data from remote storage backends. You see errors like:
level=error ts=2026-04-04T01:05:22.345Z caller=storage.go:234 msg="Error querying remote storage" err="Post \"https://thanos-query:19192/api/v1/read\": dial tcp 10.0.0.50:19192: i/o timeout"
level=error ts=2026-04-04T01:05:23.456Z caller=engine.go:567 msg="Error evaluating query" err="remote read: unexpected status code 500"
level=warn ts=2026-04-04T01:05:24.567Z caller=rule_manager.go:789 msg="Error reading external timestamps" err="remote read returned no data"Remote read failures break queries for historical data, affecting dashboards and alerting.
Diagnosis
Check Remote Read Metrics
```promql # Failed remote read requests rate(prometheus_remote_read_read_queries_failed_total[5m])
# Query duration histogram_quantile(0.95, rate(prometheus_remote_read_request_duration_seconds_bucket[5m]))
# Data read rate(prometheus_remote_read_samples_total[5m])
# Pending queries prometheus_remote_read_pending_queries ```
Check Remote Read Configuration
```bash # View remote read config curl -s http://localhost:9090/api/v1/status/config | jq '.data.remote_read'
# Check flags curl -s http://localhost:9090/api/v1/status/flags | jq '.data | select(.key | startswith("storage.remote"))' ```
Test Backend Connectivity
```bash # Test direct query to remote backend curl -s 'https://victoria-metrics:8480/api/v1/query?query=up' | jq .
# Test read endpoint (for Thanos, etc.) curl -X POST 'https://thanos-query:19192/api/v1/read' \ -H 'Content-Type: application/x-protobuf' \ --data-binary @query.pb
# Check Thanos/VM health curl -s 'https://victoria-metrics:8480/health' | jq . ```
Solutions
1. Fix Connection Timeouts
Network or timeout issues:
```yaml # prometheus.yml remote_read: - url: "https://victoria-metrics:8480/api/v1/read" name: "victoria-metrics" # Increase timeout for slow networks or large queries remote_timeout: 2m
# Adjust read consistency read_recent: true # Also read from local for recent data ```
Or via command line:
prometheus \
--storage.remote.read.max-bytes-in-frame=104857600 \
--storage.remote.read.concurrent-limit=1002. Fix Authentication Errors
Missing credentials for remote:
```yaml remote_read: - url: "https://victoria-metrics:8480/api/v1/read" # Basic auth basic_auth: username: prometheus password: your_password # password_file: /etc/prometheus/remote_password
# Bearer token # bearer_token: "your-token-here" # bearer_token_file: /etc/prometheus/bearer_token
# TLS configuration tls_config: ca_file: /etc/prometheus/certs/ca.crt cert_file: /etc/prometheus/certs/client.crt key_file: /etc/prometheus/certs/client.key # insecure_skip_verify: true # Not recommended for production ```
3. Fix Query Performance
Slow queries timing out:
```yaml remote_read: - url: "https://victoria-metrics:8480/api/v1/read" remote_timeout: 5m
# Filter what's read from remote filter_external_labels: true
# Required labels to match required_matchers: - __name__: "node_.*" - job: "node-exporter"
# Limit parallelism chunk_readers: 4 ```
Reduce query complexity:
```promql # Instead of: High cardinality query sum by (pod) (rate(container_cpu_usage_seconds_total[1h]))
# Use: Aggregate first, then remote read sum by (namespace) (rate(container_cpu_usage_seconds_total[5m])) ```
4. Handle Partial Data
Missing data or gaps in remote:
```yaml remote_read: - url: "https://victoria-metrics:8480/api/v1/read" # Read recent data from local storage read_recent: true
# Or use fallback read_timeout: 30s ```
Query with tolerance for gaps:
```promql # Use vector(0) as fallback sum(rate(http_requests_total[5m])) or vector(0)
# Use on() for combining queries sum(rate(http_requests_total[5m])) or on() group_left sum(increase(http_requests_total[5m])) ```
5. Fix Label Mismatch
Labels don't match between local and remote:
```yaml remote_read: - url: "https://victoria-metrics:8480/api/v1/read" # Add external labels external_labels: cluster: "production" region: "us-east-1"
# Filter queries by external labels filter_external_labels: true ```
Handle label differences in queries:
```promql # Query across label variations sum by (namespace) ( {__name__="container_cpu_usage_seconds_total", cluster="production"} or {__name__="container_cpu_usage_seconds_total", cluster=~"prod.*"} )
# Use on() for label matching sum by (namespace) ( container_cpu_usage_seconds_total{cluster="production"} or on(namespace, pod) group_left(cluster) container_cpu_usage_seconds_total{cluster=~"prod.*"} ) ```
6. Fix Protocol Issues
Incompatible remote read protocol:
```yaml remote_read: - url: "https://victoria-metrics:8480/api/v1/read" # Use protobuf format # Default is protobuf for most backends
# For JSON-based backends # headers: # Content-Type: "application/json"
# Custom headers if needed headers: X-Custom-Header: "value" ```
Verification
Verify Remote Read is Working
```promql # Successful read queries rate(prometheus_remote_read_samples_total[5m])
# Query duration rate(prometheus_remote_read_request_duration_seconds_sum[5m]) / rate(prometheus_remote_read_request_duration_seconds_count[5m])
# No failures rate(prometheus_remote_read_read_queries_failed_total[5m]) == 0 ```
Test Historical Queries
```bash # Query data older than local retention curl -s 'http://localhost:9090/api/v1/query?query=up&time=2026-03-01T00:00:00Z' | jq .
# Range query across remote curl -s 'http://localhost:9090/api/v1/query_range?query=up&start=2026-03-01T00:00:00Z&end=2026-03-02T00:00:00Z&step=1h' | jq . ```
Check Backend Response
```bash # Query Victoria Metrics directly curl -s 'https://victoria-metrics:8480/api/v1/query?query=up' | jq .
# Check data range curl -s 'https://victoria-metrics:8480/api/v1/query_range?query=up&start=2026-03-01T00:00:00Z&end=2026-04-01T00:00:00Z&step=1h' | jq . ```
Prevention
Add monitoring for remote read:
```yaml groups: - name: remote_read_alerts rules: - alert: RemoteReadFailing expr: rate(prometheus_remote_read_read_queries_failed_total[5m]) > 0 for: 5m labels: severity: critical annotations: summary: "Remote read is failing" description: "Remote read from {{ $labels.url }} is failing at {{ $value }} queries/sec"
- alert: RemoteReadSlow
- expr: histogram_quantile(0.95, rate(prometheus_remote_read_request_duration_seconds_bucket[5m])) > 30
- for: 10m
- labels:
- severity: warning
- annotations:
- summary: "Remote read queries are slow"
- description: "P95 query duration is {{ $value }}s"
- alert: RemoteReadQueueFull
- expr: prometheus_remote_read_pending_queries > 50
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "Remote read queue is backing up"
- description: "{{ $value }} pending read queries"
- alert: RemoteReadNoData
- expr: rate(prometheus_remote_read_samples_total[5m]) == 0 and prometheus_remote_read_read_queries_total > 0
- for: 15m
- labels:
- severity: warning
- annotations:
- summary: "Remote read returning no data"
`
Configuration Template
Complete remote read configuration:
```yaml # prometheus.yml global: external_labels: cluster: 'production' replica: 'prometheus-1'
remote_read: - url: "https://victoria-metrics:8480/api/v1/read" name: "victoria-metrics-long-term" remote_timeout: 5m read_recent: true filter_external_labels: true
basic_auth: username: prometheus password_file: /etc/prometheus/remote_password
tls_config: ca_file: /etc/prometheus/certs/ca.crt cert_file: /etc/prometheus/certs/client.crt key_file: /etc/prometheus/certs/client.key
required_matchers: - job: "node-exporter" - job: "kubelet"
chunk_readers: 4
# Secondary remote for redundancy - url: "https://thanos-query:19192/api/v1/read" name: "thanos-query" remote_timeout: 3m read_recent: false filter_external_labels: true ```
Query Best Practices
- 1.Limit time range: Query smaller time ranges for faster responses
- 2.Reduce cardinality: Aggregate before querying
- 3.Use recording rules: Pre-compute expensive queries
- 4.Set read_recent: Use
read_recent: truefor recent data from local storage - 5.Filter early: Use
required_matchersto limit data scanned