The Problem
Your Prometheus targets are showing as "down" in the UI, and you see errors like:
level=warn ts=2026-04-04T07:20:15.456Z caller=scrape.go:1456 component="scrape manager" scrape_pool=kubernetes-pods target=http://10.0.0.5:8080/metrics msg="Scrape failed" err="Get \"http://10.0.0.5:8080/metrics\": dial tcp 10.0.0.5:8080: connect: connection refused"Or in the web UI under Status → Targets:
State: DOWN
Error: Get "http://target:9090/metrics": context deadline exceeded (Client.Timeout exceeded while awaiting headers)Scrape errors prevent metric collection and break your monitoring pipeline.
Diagnosis
Check Target Status
Navigate to http://prometheus:9090/targets or use the API:
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.health == "down") | {job: .labels.job, instance: .labels.instance, error: .lastError}'Common Error Types
```promql # Count targets by health status count by (job) (up)
# Targets currently down up == 0
# Scrape duration for slow targets scrape_duration_seconds > 10 ```
Test Connectivity
```bash # Test basic connectivity curl -v http://target-host:9090/metrics
# Test with timeout curl -v --connect-timeout 10 http://target-host:9090/metrics
# Test from Prometheus container/pod kubectl exec -it prometheus-pod -- curl http://target-service:9090/metrics ```
Solutions
1. Connection Refused
The target is not listening on the expected port:
```bash # Check if service is running systemctl status myapp
# Check listening ports ss -tlnp | grep 9090 netstat -tlnp | grep 9090
# Check firewall sudo iptables -L -n | grep 9090 sudo firewall-cmd --list-ports ```
Fix by ensuring the target is running and listening:
# prometheus.yml - verify correct port
scrape_configs:
- job_name: 'myapp'
static_configs:
- targets: ['myapp-host:8080'] # verify port matches app config2. Timeout Errors
Increase scrape timeout:
```yaml # prometheus.yml global: scrape_timeout: 15s scrape_interval: 30s
scrape_configs: - job_name: 'slow-app' scrape_timeout: 30s # job-specific override scrape_interval: 60s static_configs: - targets: ['slow-app:9090'] ```
Note: scrape_timeout must be less than scrape_interval.
3. DNS Resolution Errors
Get "http://myapp:9090/metrics": lookup myapp on 10.0.0.1:53: no such hostFix DNS or use IP addresses:
scrape_configs:
- job_name: 'myapp'
static_configs:
- targets: ['10.0.0.5:9090'] # use IP instead of hostname
# or use DNS-based discovery
dns_sd_configs:
- names:
- myapp.namespace.svc.cluster.local
type: A
port: 90904. TLS Certificate Errors
Get "https://secure-app:9090/metrics": x509: certificate signed by unknown authorityConfigure TLS in scrape config:
scrape_configs:
- job_name: 'secure-app'
scheme: https
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
cert_file: /etc/prometheus/certs/client.crt
key_file: /etc/prometheus/certs/client.key
# Skip verification for self-signed (not recommended for production)
# insecure_skip_verify: true
static_configs:
- targets: ['secure-app:9090']5. Authentication Errors
Get "http://protected-app:9090/metrics": 401 UnauthorizedAdd basic auth or bearer token:
scrape_configs:
- job_name: 'protected-app'
basic_auth:
username: prometheus
password: secret_password
# or bearer token
# bearer_token: "your-token-here"
static_configs:
- targets: ['protected-app:9090']6. Kubernetes Service Discovery Issues
Targets not discovered:
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- monitoring
relabel_configs:
# Keep only pods with annotation
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
# Use annotation for port
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)
replacement: ${1}:9090Ensure pods have correct annotations:
# In your pod/deployment spec
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
prometheus.io/path: "/metrics"Verification
Check targets are now up:
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.health == "up") | .labels.job' | sort | uniq -cVerify in Prometheus UI at /targets - all should show "UP" state.
Prevention
Add alerts for scrape failures:
```yaml groups: - name: scrape_alerts rules: - alert: TargetDown expr: up == 0 for: 5m labels: severity: critical annotations: summary: "Target {{ $labels.instance }} is down" description: "{{ $labels.job }} target {{ $labels.instance }} has been down for 5 minutes. Error: {{ $labels.reason }}"
- alert: ScrapeSlow
- expr: scrape_duration_seconds > 10
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "Slow scrape for {{ $labels.job }}"
- alert: ScrapeSamplesHigh
- expr: scrape_samples_scraped > 100000
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "High sample count from {{ $labels.instance }}"
`