Fix Prometheus Scrape Error

The Problem

Your Prometheus targets are showing as "down" in the UI, and you see errors like:

bash

level=warn ts=2026-04-04T07:20:15.456Z caller=scrape.go:1456 component="scrape manager" scrape_pool=kubernetes-pods target=http://10.0.0.5:8080/metrics msg="Scrape failed" err="Get \"http://10.0.0.5:8080/metrics\": dial tcp 10.0.0.5:8080: connect: connection refused"

Or in the web UI under Status → Targets:

bash

State: DOWN
Error: Get "http://target:9090/metrics": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Scrape errors prevent metric collection and break your monitoring pipeline.

Diagnosis

Check Target Status

Navigate to http://prometheus:9090/targets or use the API:

bash

curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.health == "down") | {job: .labels.job, instance: .labels.instance, error: .lastError}'

Common Error Types

```promql # Count targets by health status count by (job) (up)

# Targets currently down up == 0

# Scrape duration for slow targets scrape_duration_seconds > 10 ```

Test Connectivity

```bash # Test basic connectivity curl -v http://target-host:9090/metrics

# Test with timeout curl -v --connect-timeout 10 http://target-host:9090/metrics

# Test from Prometheus container/pod kubectl exec -it prometheus-pod -- curl http://target-service:9090/metrics ```

Solutions

1. Connection Refused

The target is not listening on the expected port:

```bash # Check if service is running systemctl status myapp

# Check listening ports ss -tlnp | grep 9090 netstat -tlnp | grep 9090

# Check firewall sudo iptables -L -n | grep 9090 sudo firewall-cmd --list-ports ```

Fix by ensuring the target is running and listening:

yaml

# prometheus.yml - verify correct port
scrape_configs:
  - job_name: 'myapp'
    static_configs:
      - targets: ['myapp-host:8080']  # verify port matches app config

2. Timeout Errors

Increase scrape timeout:

```yaml # prometheus.yml global: scrape_timeout: 15s scrape_interval: 30s

scrape_configs: - job_name: 'slow-app' scrape_timeout: 30s # job-specific override scrape_interval: 60s static_configs: - targets: ['slow-app:9090'] ```

Note: scrape_timeout must be less than scrape_interval.

3. DNS Resolution Errors

bash

Get "http://myapp:9090/metrics": lookup myapp on 10.0.0.1:53: no such host

Fix DNS or use IP addresses:

yaml

scrape_configs:
  - job_name: 'myapp'
    static_configs:
      - targets: ['10.0.0.5:9090']  # use IP instead of hostname
    # or use DNS-based discovery
    dns_sd_configs:
      - names:
          - myapp.namespace.svc.cluster.local
        type: A
        port: 9090

4. TLS Certificate Errors

bash

Get "https://secure-app:9090/metrics": x509: certificate signed by unknown authority

Configure TLS in scrape config:

yaml

scrape_configs:
  - job_name: 'secure-app'
    scheme: https
    tls_config:
      ca_file: /etc/prometheus/certs/ca.crt
      cert_file: /etc/prometheus/certs/client.crt
      key_file: /etc/prometheus/certs/client.key
      # Skip verification for self-signed (not recommended for production)
      # insecure_skip_verify: true
    static_configs:
      - targets: ['secure-app:9090']

5. Authentication Errors

bash

Get "http://protected-app:9090/metrics": 401 Unauthorized

Add basic auth or bearer token:

yaml

scrape_configs:
  - job_name: 'protected-app'
    basic_auth:
      username: prometheus
      password: secret_password
    # or bearer token
    # bearer_token: "your-token-here"
    static_configs:
      - targets: ['protected-app:9090']

6. Kubernetes Service Discovery Issues

Targets not discovered:

yaml

scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
            - monitoring
    relabel_configs:
      # Keep only pods with annotation
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      # Use annotation for port
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: (.+)
        replacement: ${1}:9090

Ensure pods have correct annotations:

yaml

# In your pod/deployment spec
metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
    prometheus.io/path: "/metrics"

Verification

Check targets are now up:

bash

curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.health == "up") | .labels.job' | sort | uniq -c

Verify in Prometheus UI at /targets - all should show "UP" state.

Prevention

Add alerts for scrape failures:

```yaml groups: - name: scrape_alerts rules: - alert: TargetDown expr: up == 0 for: 5m labels: severity: critical annotations: summary: "Target {{ $labels.instance }} is down" description: "{{ $labels.job }} target {{ $labels.instance }} has been down for 5 minutes. Error: {{ $labels.reason }}"

alert: ScrapeSlow
expr: scrape_duration_seconds > 10
for: 5m
labels:
severity: warning
annotations:
summary: "Slow scrape for {{ $labels.job }}"

alert: ScrapeSamplesHigh
expr: scrape_samples_scraped > 100000
for: 5m
labels:
severity: warning
annotations:
summary: "High sample count from {{ $labels.instance }}"
`

The Problem

Diagnosis

Check Target Status

Common Error Types

Test Connectivity

Solutions

1. Connection Refused

2. Timeout Errors

3. DNS Resolution Errors

4. TLS Certificate Errors

5. Authentication Errors

6. Kubernetes Service Discovery Issues

Verification

Prevention

Share this guide

More Monitoring Troubleshooting Guides

Metric Retention Expired

Timeseries Storage Full

Collector Agent Crashed

Webhook Notification Timeout

SMS Notification Failed

Email Notification Bounced