The Problem
Prometheus is failing to send data to remote storage backends like Victoria Metrics, Thanos, Cortex, or Mimir. You see errors like:
level=error ts=2026-04-04T02:10:30.456Z caller=queue_manager.go:456 component="remote queue" msg="Failed to send remote write request" err="Post \"https://remote-storage:8480/api/v1/write\": dial tcp: lookup remote-storage: no such host"
level=error ts=2026-04-04T02:10:31.123Z caller=queue_manager.go:457 component="remote queue" msg="Remote write storage shutdown" err="context canceled"
level=warn ts=2026-04-04T02:10:32.789Z caller=queue_manager.go:789 msg="Remote write queue full, dropping samples"Remote write failures mean your long-term metrics are being lost, breaking historical analysis and alerting.
Diagnosis
Check Remote Write Metrics
```promql # Failed remote write requests rate(prometheus_remote_write_failed_total[5m])
# Queue capacity usage prometheus_remote_storage_queue_capacity_bytes / prometheus_remote_storage_queue_capacity_bytes > 0.8
# Samples per send rate(prometheus_remote_write_samples_total[5m])
# Pending samples in queue prometheus_remote_storage_samples_pending
# Last successful send time() - prometheus_remote_storage_last_successful_send_time_seconds ```
Check Remote Write Status
```bash # View remote write status via API curl -s http://localhost:9090/api/v1/status/tsdb | jq '.data.headStats'
# Check remote write configuration curl -s http://localhost:9090/api/v1/status/config | jq '.data.remote_write' ```
Check Network Connectivity
```bash # Test basic connectivity curl -v https://remote-storage:8480/api/v1/write
# DNS resolution test nslookup remote-storage dig remote-storage
# Port connectivity nc -zv remote-storage 8480
# From within Prometheus pod/container kubectl exec -it prometheus-pod -- curl -v https://remote-storage:8480/api/v1/write ```
Solutions
1. Fix Connection Errors
Network connectivity issues:
```yaml # prometheus.yml remote_write: - url: "https://victoria-metrics:8480/api/v1/write" # Increase timeout for slow networks queue_config: send_timeout: 30s
# Add retry configuration metadata_config: send_interval: 1m max_samples_per_send: 500 ```
If using service discovery:
remote_write:
- url: "https://victoria-metrics.monitoring.svc.cluster.local:8480/api/v1/write"
# Use IP if DNS is unreliable
# url: "https://10.0.0.100:8480/api/v1/write"2. Fix Authentication Errors
Missing or incorrect credentials:
```yaml remote_write: - url: "https://victoria-metrics:8480/api/v1/write" # Basic auth basic_auth: username: prometheus password: your_password # Or from file: # password_file: /etc/prometheus/remote_password
# Bearer token # bearer_token: "your-token-here" # bearer_token_file: /etc/prometheus/bearer_token
# TLS configuration tls_config: ca_file: /etc/prometheus/certs/ca.crt cert_file: /etc/prometheus/certs/client.crt key_file: /etc/prometheus/certs/client.key # For self-signed certs (not recommended for production) # insecure_skip_verify: true ```
3. Fix Queue Overflow
Remote write queue filling up:
```yaml remote_write: - url: "https://victoria-metrics:8480/api/v1/write" queue_config: # Increase capacity capacity: 100000 max_shards: 200 max_samples_per_send: 5000
# Adjust timing batch_send_deadline: 5s min_shards: 10 min_backoff: 30ms max_backoff: 1s
# Retry on failure retry_on_http_429: true ```
Monitor queue health:
```promql # Queue utilization prometheus_remote_storage_samples_pending / prometheus_remote_storage_queue_capacity
# Should be < 80% ```
4. Fix High Latency
Remote write experiencing high latency:
```yaml remote_write: - url: "https://victoria-metrics:8480/api/v1/write" queue_config: # Optimize for latency capacity: 50000 max_shards: 100 max_samples_per_send: 10000 batch_send_deadline: 2s
# Reduce metadata overhead metadata_config: send: true send_interval: 2m max_samples_per_send: 1000 ```
5. Handle Metric Relabeling
Sending unwanted metrics to remote storage:
```yaml remote_write: - url: "https://victoria-metrics:8480/api/v1/write" write_relabel_configs: # Only send specific metrics - source_labels: [__name__] regex: '(http_.*|process_.*|node_.*)' action: keep
# Drop high-cardinality metrics - source_labels: [__name__] regex: 'unwanted_metric_.*' action: drop
# Reduce cardinality - source_labels: [__name__] regex: 'http_request_duration_seconds_bucket' action: drop ```
6. Fix Protocol Errors
Incompatible protocol versions:
```yaml remote_write: - url: "https://victoria-metrics:8480/api/v1/write" # Specify remote write version remote_timeout: 30s
# For Prometheus remote write v2 # headers: # X-Prometheus-Remote-Write-Version: "0.2.0"
# Use protobuf format # send_exemplars: true
# For older backends queue_config: send_timeout: 30s ```
Verification
Verify Remote Write is Working
```promql # Samples sent successfully rate(prometheus_remote_write_samples_total[5m])
# No failures rate(prometheus_remote_write_failed_total[5m]) == 0
# Last successful send time() - prometheus_remote_storage_last_successful_send_time_seconds < 60 ```
Check Logs
```bash # Check for remote write errors journalctl -u prometheus --since "1 hour ago" | grep -i "remote write"
# Check queue status journalctl -u prometheus --since "1 hour ago" | grep -i "queue" ```
Verify Data at Remote
```bash # Query Victoria Metrics directly curl -s 'https://victoria-metrics:8480/api/v1/query?query=up' | jq .
# Check data ingestion rate curl -s 'https://victoria-metrics:8480/api/v1/query?query=rate(prometheus_remote_write_samples_total[5m])' | jq . ```
Prevention
Add monitoring for remote write:
```yaml groups: - name: remote_write_alerts rules: - alert: RemoteWriteFailing expr: rate(prometheus_remote_write_failed_total[5m]) > 0 for: 5m labels: severity: critical annotations: summary: "Remote write is failing" description: "Remote write to {{ $labels.url }} is failing at {{ $value }} samples/sec"
- alert: RemoteWriteQueueFull
- expr: prometheus_remote_storage_samples_pending / prometheus_remote_storage_queue_capacity > 0.8
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "Remote write queue is nearly full"
- description: "Queue is at {{ $value | humanizePercentage }} capacity"
- alert: RemoteWriteLag
- expr: time() - prometheus_remote_storage_last_successful_send_time_seconds > 300
- for: 5m
- labels:
- severity: critical
- annotations:
- summary: "Remote write has not succeeded in 5 minutes"
- alert: RemoteWriteShardingHigh
- expr: prometheus_remote_storage_shards_maximum > 100
- for: 10m
- labels:
- severity: warning
- annotations:
- summary: "Remote write sharding is high"
- description: "Maximum shards: {{ $value }}, consider increasing queue capacity"
`
Configuration Template
Complete remote write configuration:
```yaml # prometheus.yml global: external_labels: cluster: 'production' replica: 'prometheus-1'
remote_write: - url: "https://victoria-metrics:8480/api/v1/write" name: "victoria-metrics" remote_timeout: 30s queue_config: capacity: 100000 max_shards: 200 min_shards: 10 max_samples_per_send: 5000 batch_send_deadline: 5s min_backoff: 30ms max_backoff: 1s retry_on_http_429: true metadata_config: send: true send_interval: 1m max_samples_per_send: 500 write_relabel_configs: - source_labels: [__name__] regex: 'up|scrape_.*|prometheus_.*' action: drop ```