Fix Prometheus Remote Write Failed

The Problem

Prometheus is failing to send data to remote storage backends like Victoria Metrics, Thanos, Cortex, or Mimir. You see errors like:

bash

level=error ts=2026-04-04T02:10:30.456Z caller=queue_manager.go:456 component="remote queue" msg="Failed to send remote write request" err="Post \"https://remote-storage:8480/api/v1/write\": dial tcp: lookup remote-storage: no such host"
level=error ts=2026-04-04T02:10:31.123Z caller=queue_manager.go:457 component="remote queue" msg="Remote write storage shutdown" err="context canceled"
level=warn ts=2026-04-04T02:10:32.789Z caller=queue_manager.go:789 msg="Remote write queue full, dropping samples"

Remote write failures mean your long-term metrics are being lost, breaking historical analysis and alerting.

Diagnosis

Check Remote Write Metrics

```promql # Failed remote write requests rate(prometheus_remote_write_failed_total[5m])

# Queue capacity usage prometheus_remote_storage_queue_capacity_bytes / prometheus_remote_storage_queue_capacity_bytes > 0.8

# Samples per send rate(prometheus_remote_write_samples_total[5m])

# Pending samples in queue prometheus_remote_storage_samples_pending

# Last successful send time() - prometheus_remote_storage_last_successful_send_time_seconds ```

Check Remote Write Status

```bash # View remote write status via API curl -s http://localhost:9090/api/v1/status/tsdb | jq '.data.headStats'

# Check remote write configuration curl -s http://localhost:9090/api/v1/status/config | jq '.data.remote_write' ```

Check Network Connectivity

```bash # Test basic connectivity curl -v https://remote-storage:8480/api/v1/write

# DNS resolution test nslookup remote-storage dig remote-storage

# Port connectivity nc -zv remote-storage 8480

# From within Prometheus pod/container kubectl exec -it prometheus-pod -- curl -v https://remote-storage:8480/api/v1/write ```

Solutions

1. Fix Connection Errors

Network connectivity issues:

```yaml # prometheus.yml remote_write: - url: "https://victoria-metrics:8480/api/v1/write" # Increase timeout for slow networks queue_config: send_timeout: 30s

# Add retry configuration metadata_config: send_interval: 1m max_samples_per_send: 500 ```

If using service discovery:

yaml

remote_write:
  - url: "https://victoria-metrics.monitoring.svc.cluster.local:8480/api/v1/write"
    # Use IP if DNS is unreliable
    # url: "https://10.0.0.100:8480/api/v1/write"

2. Fix Authentication Errors

Missing or incorrect credentials:

```yaml remote_write: - url: "https://victoria-metrics:8480/api/v1/write" # Basic auth basic_auth: username: prometheus password: your_password # Or from file: # password_file: /etc/prometheus/remote_password

# Bearer token # bearer_token: "your-token-here" # bearer_token_file: /etc/prometheus/bearer_token

# TLS configuration tls_config: ca_file: /etc/prometheus/certs/ca.crt cert_file: /etc/prometheus/certs/client.crt key_file: /etc/prometheus/certs/client.key # For self-signed certs (not recommended for production) # insecure_skip_verify: true ```

3. Fix Queue Overflow

Remote write queue filling up:

```yaml remote_write: - url: "https://victoria-metrics:8480/api/v1/write" queue_config: # Increase capacity capacity: 100000 max_shards: 200 max_samples_per_send: 5000

# Adjust timing batch_send_deadline: 5s min_shards: 10 min_backoff: 30ms max_backoff: 1s

# Retry on failure retry_on_http_429: true ```

Monitor queue health:

```promql # Queue utilization prometheus_remote_storage_samples_pending / prometheus_remote_storage_queue_capacity

# Should be < 80% ```

4. Fix High Latency

Remote write experiencing high latency:

```yaml remote_write: - url: "https://victoria-metrics:8480/api/v1/write" queue_config: # Optimize for latency capacity: 50000 max_shards: 100 max_samples_per_send: 10000 batch_send_deadline: 2s

# Reduce metadata overhead metadata_config: send: true send_interval: 2m max_samples_per_send: 1000 ```

5. Handle Metric Relabeling

Sending unwanted metrics to remote storage:

```yaml remote_write: - url: "https://victoria-metrics:8480/api/v1/write" write_relabel_configs: # Only send specific metrics - source_labels: [__name__] regex: '(http_.*|process_.*|node_.*)' action: keep

# Drop high-cardinality metrics - source_labels: [__name__] regex: 'unwanted_metric_.*' action: drop

# Reduce cardinality - source_labels: [__name__] regex: 'http_request_duration_seconds_bucket' action: drop ```

6. Fix Protocol Errors

Incompatible protocol versions:

```yaml remote_write: - url: "https://victoria-metrics:8480/api/v1/write" # Specify remote write version remote_timeout: 30s

# For Prometheus remote write v2 # headers: # X-Prometheus-Remote-Write-Version: "0.2.0"

# Use protobuf format # send_exemplars: true

# For older backends queue_config: send_timeout: 30s ```

Verification

Verify Remote Write is Working

```promql # Samples sent successfully rate(prometheus_remote_write_samples_total[5m])

# No failures rate(prometheus_remote_write_failed_total[5m]) == 0

# Last successful send time() - prometheus_remote_storage_last_successful_send_time_seconds < 60 ```

Check Logs

```bash # Check for remote write errors journalctl -u prometheus --since "1 hour ago" | grep -i "remote write"

# Check queue status journalctl -u prometheus --since "1 hour ago" | grep -i "queue" ```

Verify Data at Remote

```bash # Query Victoria Metrics directly curl -s 'https://victoria-metrics:8480/api/v1/query?query=up' | jq .

# Check data ingestion rate curl -s 'https://victoria-metrics:8480/api/v1/query?query=rate(prometheus_remote_write_samples_total[5m])' | jq . ```

Prevention

Add monitoring for remote write:

```yaml groups: - name: remote_write_alerts rules: - alert: RemoteWriteFailing expr: rate(prometheus_remote_write_failed_total[5m]) > 0 for: 5m labels: severity: critical annotations: summary: "Remote write is failing" description: "Remote write to {{ $labels.url }} is failing at {{ $value }} samples/sec"

alert: RemoteWriteQueueFull
expr: prometheus_remote_storage_samples_pending / prometheus_remote_storage_queue_capacity > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Remote write queue is nearly full"
description: "Queue is at {{ $value | humanizePercentage }} capacity"

alert: RemoteWriteLag
expr: time() - prometheus_remote_storage_last_successful_send_time_seconds > 300
for: 5m
labels:
severity: critical
annotations:
summary: "Remote write has not succeeded in 5 minutes"

alert: RemoteWriteShardingHigh
expr: prometheus_remote_storage_shards_maximum > 100
for: 10m
labels:
severity: warning
annotations:
summary: "Remote write sharding is high"
description: "Maximum shards: {{ $value }}, consider increasing queue capacity"
`

Configuration Template

Complete remote write configuration:

```yaml # prometheus.yml global: external_labels: cluster: 'production' replica: 'prometheus-1'

remote_write: - url: "https://victoria-metrics:8480/api/v1/write" name: "victoria-metrics" remote_timeout: 30s queue_config: capacity: 100000 max_shards: 200 min_shards: 10 max_samples_per_send: 5000 batch_send_deadline: 5s min_backoff: 30ms max_backoff: 1s retry_on_http_429: true metadata_config: send: true send_interval: 1m max_samples_per_send: 500 write_relabel_configs: - source_labels: [__name__] regex: 'up|scrape_.*|prometheus_.*' action: drop ```