The Problem

Your Prometheus targets are showing as "down" in the UI, and you see errors like:

bash
level=warn ts=2026-04-04T07:20:15.456Z caller=scrape.go:1456 component="scrape manager" scrape_pool=kubernetes-pods target=http://10.0.0.5:8080/metrics msg="Scrape failed" err="Get \"http://10.0.0.5:8080/metrics\": dial tcp 10.0.0.5:8080: connect: connection refused"

Or in the web UI under Status → Targets:

bash
State: DOWN
Error: Get "http://target:9090/metrics": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Scrape errors prevent metric collection and break your monitoring pipeline.

Diagnosis

Check Target Status

Navigate to http://prometheus:9090/targets or use the API:

bash
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.health == "down") | {job: .labels.job, instance: .labels.instance, error: .lastError}'

Common Error Types

```promql # Count targets by health status count by (job) (up)

# Targets currently down up == 0

# Scrape duration for slow targets scrape_duration_seconds > 10 ```

Test Connectivity

```bash # Test basic connectivity curl -v http://target-host:9090/metrics

# Test with timeout curl -v --connect-timeout 10 http://target-host:9090/metrics

# Test from Prometheus container/pod kubectl exec -it prometheus-pod -- curl http://target-service:9090/metrics ```

Solutions

1. Connection Refused

The target is not listening on the expected port:

```bash # Check if service is running systemctl status myapp

# Check listening ports ss -tlnp | grep 9090 netstat -tlnp | grep 9090

# Check firewall sudo iptables -L -n | grep 9090 sudo firewall-cmd --list-ports ```

Fix by ensuring the target is running and listening:

yaml
# prometheus.yml - verify correct port
scrape_configs:
  - job_name: 'myapp'
    static_configs:
      - targets: ['myapp-host:8080']  # verify port matches app config

2. Timeout Errors

Increase scrape timeout:

```yaml # prometheus.yml global: scrape_timeout: 15s scrape_interval: 30s

scrape_configs: - job_name: 'slow-app' scrape_timeout: 30s # job-specific override scrape_interval: 60s static_configs: - targets: ['slow-app:9090'] ```

Note: scrape_timeout must be less than scrape_interval.

3. DNS Resolution Errors

bash
Get "http://myapp:9090/metrics": lookup myapp on 10.0.0.1:53: no such host

Fix DNS or use IP addresses:

yaml
scrape_configs:
  - job_name: 'myapp'
    static_configs:
      - targets: ['10.0.0.5:9090']  # use IP instead of hostname
    # or use DNS-based discovery
    dns_sd_configs:
      - names:
          - myapp.namespace.svc.cluster.local
        type: A
        port: 9090

4. TLS Certificate Errors

bash
Get "https://secure-app:9090/metrics": x509: certificate signed by unknown authority

Configure TLS in scrape config:

yaml
scrape_configs:
  - job_name: 'secure-app'
    scheme: https
    tls_config:
      ca_file: /etc/prometheus/certs/ca.crt
      cert_file: /etc/prometheus/certs/client.crt
      key_file: /etc/prometheus/certs/client.key
      # Skip verification for self-signed (not recommended for production)
      # insecure_skip_verify: true
    static_configs:
      - targets: ['secure-app:9090']

5. Authentication Errors

bash
Get "http://protected-app:9090/metrics": 401 Unauthorized

Add basic auth or bearer token:

yaml
scrape_configs:
  - job_name: 'protected-app'
    basic_auth:
      username: prometheus
      password: secret_password
    # or bearer token
    # bearer_token: "your-token-here"
    static_configs:
      - targets: ['protected-app:9090']

6. Kubernetes Service Discovery Issues

Targets not discovered:

yaml
scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
            - monitoring
    relabel_configs:
      # Keep only pods with annotation
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      # Use annotation for port
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: (.+)
        replacement: ${1}:9090

Ensure pods have correct annotations:

yaml
# In your pod/deployment spec
metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
    prometheus.io/path: "/metrics"

Verification

Check targets are now up:

bash
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.health == "up") | .labels.job' | sort | uniq -c

Verify in Prometheus UI at /targets - all should show "UP" state.

Prevention

Add alerts for scrape failures:

```yaml groups: - name: scrape_alerts rules: - alert: TargetDown expr: up == 0 for: 5m labels: severity: critical annotations: summary: "Target {{ $labels.instance }} is down" description: "{{ $labels.job }} target {{ $labels.instance }} has been down for 5 minutes. Error: {{ $labels.reason }}"

  • alert: ScrapeSlow
  • expr: scrape_duration_seconds > 10
  • for: 5m
  • labels:
  • severity: warning
  • annotations:
  • summary: "Slow scrape for {{ $labels.job }}"
  • alert: ScrapeSamplesHigh
  • expr: scrape_samples_scraped > 100000
  • for: 5m
  • labels:
  • severity: warning
  • annotations:
  • summary: "High sample count from {{ $labels.instance }}"
  • `