Fix Prometheus Remote Read Failed

The Problem

Prometheus is failing to query data from remote storage backends. You see errors like:

bash

level=error ts=2026-04-04T01:05:22.345Z caller=storage.go:234 msg="Error querying remote storage" err="Post \"https://thanos-query:19192/api/v1/read\": dial tcp 10.0.0.50:19192: i/o timeout"
level=error ts=2026-04-04T01:05:23.456Z caller=engine.go:567 msg="Error evaluating query" err="remote read: unexpected status code 500"
level=warn ts=2026-04-04T01:05:24.567Z caller=rule_manager.go:789 msg="Error reading external timestamps" err="remote read returned no data"

Remote read failures break queries for historical data, affecting dashboards and alerting.

Diagnosis

Check Remote Read Metrics

```promql # Failed remote read requests rate(prometheus_remote_read_read_queries_failed_total[5m])

# Query duration histogram_quantile(0.95, rate(prometheus_remote_read_request_duration_seconds_bucket[5m]))

# Data read rate(prometheus_remote_read_samples_total[5m])

# Pending queries prometheus_remote_read_pending_queries ```

Check Remote Read Configuration

```bash # View remote read config curl -s http://localhost:9090/api/v1/status/config | jq '.data.remote_read'

# Check flags curl -s http://localhost:9090/api/v1/status/flags | jq '.data | select(.key | startswith("storage.remote"))' ```

Test Backend Connectivity

```bash # Test direct query to remote backend curl -s 'https://victoria-metrics:8480/api/v1/query?query=up' | jq .

# Test read endpoint (for Thanos, etc.) curl -X POST 'https://thanos-query:19192/api/v1/read' \ -H 'Content-Type: application/x-protobuf' \ --data-binary @query.pb

# Check Thanos/VM health curl -s 'https://victoria-metrics:8480/health' | jq . ```

Solutions

1. Fix Connection Timeouts

Network or timeout issues:

```yaml # prometheus.yml remote_read: - url: "https://victoria-metrics:8480/api/v1/read" name: "victoria-metrics" # Increase timeout for slow networks or large queries remote_timeout: 2m

# Adjust read consistency read_recent: true # Also read from local for recent data ```

Or via command line:

bash

prometheus \
  --storage.remote.read.max-bytes-in-frame=104857600 \
  --storage.remote.read.concurrent-limit=100

2. Fix Authentication Errors

Missing credentials for remote:

```yaml remote_read: - url: "https://victoria-metrics:8480/api/v1/read" # Basic auth basic_auth: username: prometheus password: your_password # password_file: /etc/prometheus/remote_password

# Bearer token # bearer_token: "your-token-here" # bearer_token_file: /etc/prometheus/bearer_token

# TLS configuration tls_config: ca_file: /etc/prometheus/certs/ca.crt cert_file: /etc/prometheus/certs/client.crt key_file: /etc/prometheus/certs/client.key # insecure_skip_verify: true # Not recommended for production ```

3. Fix Query Performance

Slow queries timing out:

```yaml remote_read: - url: "https://victoria-metrics:8480/api/v1/read" remote_timeout: 5m

# Filter what's read from remote filter_external_labels: true

# Required labels to match required_matchers: - __name__: "node_.*" - job: "node-exporter"

# Limit parallelism chunk_readers: 4 ```

Reduce query complexity:

```promql # Instead of: High cardinality query sum by (pod) (rate(container_cpu_usage_seconds_total[1h]))

# Use: Aggregate first, then remote read sum by (namespace) (rate(container_cpu_usage_seconds_total[5m])) ```

4. Handle Partial Data

Missing data or gaps in remote:

```yaml remote_read: - url: "https://victoria-metrics:8480/api/v1/read" # Read recent data from local storage read_recent: true

# Or use fallback read_timeout: 30s ```

Query with tolerance for gaps:

```promql # Use vector(0) as fallback sum(rate(http_requests_total[5m])) or vector(0)

# Use on() for combining queries sum(rate(http_requests_total[5m])) or on() group_left sum(increase(http_requests_total[5m])) ```

5. Fix Label Mismatch

Labels don't match between local and remote:

```yaml remote_read: - url: "https://victoria-metrics:8480/api/v1/read" # Add external labels external_labels: cluster: "production" region: "us-east-1"

# Filter queries by external labels filter_external_labels: true ```

Handle label differences in queries:

```promql # Query across label variations sum by (namespace) ( {__name__="container_cpu_usage_seconds_total", cluster="production"} or {__name__="container_cpu_usage_seconds_total", cluster=~"prod.*"} )

# Use on() for label matching sum by (namespace) ( container_cpu_usage_seconds_total{cluster="production"} or on(namespace, pod) group_left(cluster) container_cpu_usage_seconds_total{cluster=~"prod.*"} ) ```

6. Fix Protocol Issues

Incompatible remote read protocol:

```yaml remote_read: - url: "https://victoria-metrics:8480/api/v1/read" # Use protobuf format # Default is protobuf for most backends

# For JSON-based backends # headers: # Content-Type: "application/json"

# Custom headers if needed headers: X-Custom-Header: "value" ```

Verification

Verify Remote Read is Working

```promql # Successful read queries rate(prometheus_remote_read_samples_total[5m])

# Query duration rate(prometheus_remote_read_request_duration_seconds_sum[5m]) / rate(prometheus_remote_read_request_duration_seconds_count[5m])

# No failures rate(prometheus_remote_read_read_queries_failed_total[5m]) == 0 ```

Test Historical Queries

```bash # Query data older than local retention curl -s 'http://localhost:9090/api/v1/query?query=up&time=2026-03-01T00:00:00Z' | jq .

# Range query across remote curl -s 'http://localhost:9090/api/v1/query_range?query=up&start=2026-03-01T00:00:00Z&end=2026-03-02T00:00:00Z&step=1h' | jq . ```

Check Backend Response

```bash # Query Victoria Metrics directly curl -s 'https://victoria-metrics:8480/api/v1/query?query=up' | jq .

# Check data range curl -s 'https://victoria-metrics:8480/api/v1/query_range?query=up&start=2026-03-01T00:00:00Z&end=2026-04-01T00:00:00Z&step=1h' | jq . ```

Prevention

Add monitoring for remote read:

```yaml groups: - name: remote_read_alerts rules: - alert: RemoteReadFailing expr: rate(prometheus_remote_read_read_queries_failed_total[5m]) > 0 for: 5m labels: severity: critical annotations: summary: "Remote read is failing" description: "Remote read from {{ $labels.url }} is failing at {{ $value }} queries/sec"

alert: RemoteReadSlow
expr: histogram_quantile(0.95, rate(prometheus_remote_read_request_duration_seconds_bucket[5m])) > 30
for: 10m
labels:
severity: warning
annotations:
summary: "Remote read queries are slow"
description: "P95 query duration is {{ $value }}s"

alert: RemoteReadQueueFull
expr: prometheus_remote_read_pending_queries > 50
for: 5m
labels:
severity: warning
annotations:
summary: "Remote read queue is backing up"
description: "{{ $value }} pending read queries"

alert: RemoteReadNoData
expr: rate(prometheus_remote_read_samples_total[5m]) == 0 and prometheus_remote_read_read_queries_total > 0
for: 15m
labels:
severity: warning
annotations:
summary: "Remote read returning no data"
`

Configuration Template

Complete remote read configuration:

```yaml # prometheus.yml global: external_labels: cluster: 'production' replica: 'prometheus-1'

remote_read: - url: "https://victoria-metrics:8480/api/v1/read" name: "victoria-metrics-long-term" remote_timeout: 5m read_recent: true filter_external_labels: true

basic_auth: username: prometheus password_file: /etc/prometheus/remote_password

tls_config: ca_file: /etc/prometheus/certs/ca.crt cert_file: /etc/prometheus/certs/client.crt key_file: /etc/prometheus/certs/client.key

required_matchers: - job: "node-exporter" - job: "kubelet"

chunk_readers: 4

# Secondary remote for redundancy - url: "https://thanos-query:19192/api/v1/read" name: "thanos-query" remote_timeout: 3m read_recent: false filter_external_labels: true ```

Query Best Practices

1.Limit time range: Query smaller time ranges for faster responses
2.Reduce cardinality: Aggregate before querying
3.Use recording rules: Pre-compute expensive queries
4.Set read_recent: Use read_recent: true for recent data from local storage
5.Filter early: Use required_matchers to limit data scanned