The Problem
Prometheus fails to discover targets via Consul, or discovered targets are showing errors:
level=error ts=2026-04-04T21:35:22.123Z caller=consul.go:234 msg="Consul service discovery failed" err="Get \"http://consul:8500/v1/catalog/services\": dial tcp: connection refused"
level=error ts=2026-04-04T21:35:23.234Z caller=consul.go:235 msg="Consul refresh failed" err="Permission denied: ACL token invalid"
level=warn ts=2026-04-04T21:35:24.345Z caller=consul.go:236 msg="No services found in Consul" services=""Consul service discovery errors prevent dynamic target discovery, requiring manual configuration updates.
Diagnosis
Check Consul Connectivity
```bash # Test Consul API directly curl -s http://consul:8500/v1/catalog/services | jq .
# Check Consul health curl -s http://consul:8500/v1/agent/service/health | jq .
# Check specific service curl -s http://consul:8500/v1/catalog/service/myapp | jq . ```
Check Prometheus Consul SD Status
```bash # Check discovered targets curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.labels.job | contains("consul"))'
# Check Prometheus logs for Consul errors journalctl -u prometheus --since "1 hour ago" | grep -i consul ```
Check Consul ACL Token
```bash # Verify ACL token curl -s -H "X-Consul-Token: your-token" http://consul:8500/v1/acl/token/self | jq .
# Check token permissions consul acl token read -id your-token-id ```
Check Service Tags
```bash # List services with tags curl -s http://consul:8500/v1/catalog/services | jq .
# Get detailed service info curl -s "http://consul:8500/v1/catalog/service/myapp?tag=prometheus" | jq . ```
Solutions
1. Fix Basic Consul Configuration
Incorrect Consul SD configuration:
```yaml # prometheus.yml scrape_configs: # WRONG: Missing required configuration # - job_name: 'consul-services' # consul_sd_configs: []
# CORRECT: Proper Consul SD configuration - job_name: 'consul-services' consul_sd_configs: - server: 'consul:8500' services: [] # All services # Or specific services # services: ['myapp', 'node-exporter'] tags: ['prometheus'] # Filter by tag node_meta: environment: 'production' refresh_interval: 30s ```
2. Fix ACL Token Configuration
Consul requires ACL token for access:
scrape_configs:
- job_name: 'consul-services'
consul_sd_configs:
- server: 'consul:8500'
services: ['myapp']
token: 'your-consul-acl-token'
# Or from file
# token_file: /etc/prometheus/consul-tokenCreate proper ACL token:
```bash # Create ACL policy for Prometheus consul acl policy create -name "prometheus-read" \ -rules 'service_prefix "" { policy = "read" } node_prefix "" { policy = "read" }'
# Create token consul acl token create -name "prometheus-token" \ -policy-name "prometheus-read" \ -secret "your-secret-token" ```
Token file approach:
# Store token securely
echo "your-consul-acl-token" > /etc/prometheus/consul-token
chmod 600 /etc/prometheus/consul-token3. Fix Service Tag Filtering
Services not discovered due to tag mismatch:
scrape_configs:
- job_name: 'consul-services'
consul_sd_configs:
- server: 'consul:8500'
tags: ['prometheus'] # Only discover services with this tagRegister services with correct tags:
# Service registration with prometheus tag
{
"Service": {
"Name": "myapp",
"Tags": ["prometheus", "production"],
"Address": "10.0.0.5",
"Port": 9090,
"Meta": {
"prometheus_scrape_path": "/metrics",
"prometheus_scrape_port": "9090"
}
}
}Register service via Consul API:
curl -X PUT http://consul:8500/v1/agent/service/register -d '{
"Name": "myapp",
"Tags": ["prometheus"],
"Address": "10.0.0.5",
"Port": 9090,
"Meta": {"prometheus_scrape_path": "/metrics"}
}'4. Configure Relabeling
Use service metadata for configuration:
```yaml scrape_configs: - job_name: 'consul-services' consul_sd_configs: - server: 'consul:8500' services: [] tags: ['prometheus'] relabel_configs: # Use address from Consul - source_labels: [__address__] target_label: __address__ regex: '([^:]+):\d+' replacement: '${1}:9090'
# Use custom scrape path from metadata - source_labels: [__meta_consul_service_metadata_prometheus_scrape_path] target_label: __metrics_path__ regex: '(.+)' replacement: '${1}' action: replace
# Use custom port from metadata - source_labels: [__meta_consul_service_metadata_prometheus_scrape_port] target_label: __address__ regex: '(.+)' replacement: '${1}'
# Keep only healthy services - source_labels: [__meta_consul_health_status] regex: 'passing' action: keep
# Set job from service name - source_labels: [__meta_consul_service] target_label: job ```
5. Fix Network Connectivity
Consul unreachable from Prometheus:
```bash # Check network connectivity ping consul nc -zv consul 8500
# From Prometheus container kubectl exec -it prometheus-pod -- curl -s http://consul:8500/v1/catalog/services
# Use proper service discovery ```
```yaml # Kubernetes DNS scrape_configs: - job_name: 'consul-services' consul_sd_configs: - server: 'consul.default.svc.cluster.local:8500'
# External Consul scrape_configs: - job_name: 'consul-services' consul_sd_configs: - server: 'consul.example.com:8500' # Add authentication basic_auth: username: prometheus password: secret ```
6. Handle TLS Configuration
Secure Consul connections:
scrape_configs:
- job_name: 'consul-services'
consul_sd_configs:
- server: 'https://consul:8500'
token: 'your-acl-token'
tls_config:
ca_file: /etc/prometheus/consul/ca.crt
cert_file: /etc/prometheus/consul/client.crt
key_file: /etc/prometheus/consul/client.key
insecure_skip_verify: false7. Fix Watch Issues
Consul watch not updating:
scrape_configs:
- job_name: 'consul-services'
consul_sd_configs:
- server: 'consul:8500'
refresh_interval: 15s # Check every 15 secondsForce refresh via API:
# Trigger discovery refresh
curl -X POST http://localhost:9090/-/reloadVerification
Check Discovered Targets
# List Consul-discovered targets
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.discoveredLabels.job == "consul-services") | {job: .labels.job, instance: .labels.instance}'Verify Services
```promql # Count discovered Consul services count by (job) (up{job="consul-services"})
# Services not discovered count(up{job="consul-services"}) != count(up{job="consul-services"} and up == 1) ```
Test Consul SD
```bash # Manual query of Consul curl -s "http://consul:8500/v1/catalog/services" | jq 'keys'
# Check specific service instances curl -s "http://consul:8500/v1/catalog/service/myapp" | jq '.[] | {Address: .ServiceAddress, Port: .ServicePort, Tags: .ServiceTags}' ```
Prevention
Add monitoring for Consul SD:
```yaml groups: - name: consul_sd_alerts rules: - alert: ConsulSDDown expr: up{job="consul"} == 0 for: 5m labels: severity: critical annotations: summary: "Consul service discovery is down" description: "Cannot connect to Consul server at {{ $labels.instance }}"
- alert: ConsulSDNoServices
- expr: count(up{job="consul-services"}) == 0
- for: 10m
- labels:
- severity: warning
- annotations:
- summary: "No services discovered from Consul"
- description: "No targets found via Consul service discovery"
- alert: ConsulSDRefreshFailed
- expr: increase(prometheus_sd_consul_refresh_failures_total[5m]) > 0
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "Consul SD refresh failed"
- description: "{{ $value }} Consul refresh failures in last 5 minutes"
- alert: ConsulServiceUnhealthy
- expr: up{job="consul-services"} == 0
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "Consul-discovered service is down"
- description: "Service {{ $labels.instance }} discovered via Consul is not responding"
`
Consul Service Registration Template
Register services properly for Prometheus:
{
"Service": {
"Name": "myapp",
"ID": "myapp-10.0.0.5-9090",
"Tags": ["prometheus", "app", "production"],
"Address": "10.0.0.5",
"Port": 9090,
"Meta": {
"prometheus_scrape_path": "/metrics",
"prometheus_scrape_interval": "30s",
"environment": "production"
},
"Check": {
"HTTP": "http://10.0.0.5:9090/health",
"Interval": "10s",
"Timeout": "5s"
}
}
}