The Problem
Prometheus fails to discover Kubernetes pods, services, or endpoints, or discovered targets show errors:
level=error ts=2026-04-04T20:30:15.234Z caller=kubernetes.go:234 msg="Kubernetes service discovery failed" err="failed to list pods: Unauthorized"
level=error ts=2026-04-04T20:30:16.345Z caller=kubernetes.go:235 msg="Failed to refresh kubernetes targets" err="Get \"https://kubernetes.default/api/v1/pods\": dial tcp: i/o timeout"
level=warn ts=2026-04-04T20:30:17.456Z caller=kubernetes.go:236 msg="No endpoints found" service="myapp-service"Kubernetes service discovery errors prevent automatic target discovery, breaking pod and service monitoring.
Diagnosis
Check RBAC Permissions
```bash # Verify Prometheus service account kubectl get serviceaccount prometheus -n monitoring
# Check cluster role kubectl get clusterrole prometheus
# Check role binding kubectl get clusterrolebinding prometheus
# Test API access kubectl auth can-i list pods --as=system:serviceaccount:monitoring:prometheus kubectl auth can-i list endpoints --as=system:serviceaccount:monitoring:prometheus kubectl auth can-i list services --as=system:serviceaccount:monitoring:prometheus ```
Check Kubernetes API Connectivity
```bash # Test from Prometheus pod kubectl exec -it prometheus-pod -n monitoring -- curl -k https://kubernetes.default/api/v1/pods
# Check API server kubectl cluster-info
# Verify endpoints kubectl get endpoints -A | grep prometheus ```
Check Pod Annotations
```bash # Check pod annotations for prometheus scraping kubectl get pods -A -o jsonpath='{range .items[*]}{@.metadata.name}{"\t"}{@.metadata.annotations.prometheus\.io/scrape}{"\n"}{end}'
# Detailed check kubectl get pod myapp-pod -o yaml | grep -A5 annotations ```
Check Prometheus SD Configuration
```bash # View current config curl -s http://localhost:9090/api/v1/status/config | jq '.data.scrape_configs[] | select(.kubernetes_sd_configs)'
# Check discovered targets curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.discoveredLabels.__meta_kubernetes_namespace)' ```
Solutions
1. Fix RBAC Permissions
Missing or insufficient RBAC permissions:
# prometheus-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: ["extensions", "networking.k8s.io"]
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoringApply RBAC:
kubectl apply -f prometheus-rbac.yaml2. Fix Kubernetes SD Configuration
Incorrect service discovery configuration:
```yaml # prometheus.yml scrape_configs: - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod namespaces: names: - monitoring - default - production relabel_configs: # Keep pods with prometheus.io/scrape annotation - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true
# Use annotation for metrics path - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+)
# Use annotation for port - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: ${1}:${2} target_label: __address__
# Add labels from pod metadata - action: labelmap regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
- action: replace
- target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
- action: replace
- target_label: pod
`
3. Fix Pod Annotations
Pods not properly annotated for scraping:
# Pod/deployment spec with annotations
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
template:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
prometheus.io/path: "/metrics"
spec:
containers:
- name: myapp
image: myapp:latest
ports:
- containerPort: 9090
name: metrics4. Fix Service Discovery Role
Incorrect role type for Kubernetes SD:
```yaml scrape_configs: # Pod discovery - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod
# Service discovery (for service-level monitoring) - job_name: 'kubernetes-services' kubernetes_sd_configs: - role: service
# Endpoints discovery - job_name: 'kubernetes-endpoints' kubernetes_sd_configs: - role: endpoints
# Node discovery - job_name: 'kubernetes-nodes' kubernetes_sd_configs: - role: node relabel_configs: - source_labels: [__address__] regex: '([^:]+):\d+' target_label: __address__ replacement: '${1}:9100' ```
5. Fix Namespace Filtering
Services/pods not in discovered namespaces:
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
namespaces:
own_namespace: false
names:
- monitoring
- production
- staging
# Or all namespaces
# own_namespace: false6. Handle API Server Connectivity
Cannot reach Kubernetes API:
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
# Use in-cluster config (default)
# Or specify API server
# api_server: https://kubernetes.defaultFrom outside cluster:
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
api_server: https://kubernetes.example.com
kubeconfig_file: /etc/prometheus/kubeconfig
# Or use certificates directly
tls_config:
ca_file: /etc/prometheus/k8s/ca.crt
cert_file: /etc/prometheus/k8s/client.crt
key_file: /etc/prometheus/k8s/client.key7. Fix Endpoint Discovery
Endpoints not properly discovered:
```yaml scrape_configs: - job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: # Keep only endpoints with prometheus.io/scrape annotation on service - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true
# Use annotation for path - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+)
# Use annotation for port - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: ${1}:${2} target_label: __address__ ```
Annotate services:
apiVersion: v1
kind: Service
metadata:
name: myapp
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
prometheus.io/path: "/metrics"
spec:
ports:
- port: 9090
name: metricsVerification
Check Discovered Targets
# List Kubernetes discovered targets
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.discoveredLabels.__meta_kubernetes_pod_name) | {pod: .discoveredLabels.__meta_kubernetes_pod_name, namespace: .discoveredLabels.__meta_kubernetes_namespace}'Verify RBAC Works
# Test access from Prometheus pod
kubectl exec -it prometheus-pod -n monitoring -- sh -c 'curl -s -k -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://kubernetes.default/api/v1/namespaces/monitoring/pods | jq .items[].metadata.name'Check Pods Are Scraped
```promql # Count scraped pods by namespace count by (namespace) (up{job="kubernetes-pods"})
# Pods not scraped count(up{job="kubernetes-pods"}) - count(up{job="kubernetes-pods"} == 1) ```
Prevention
Add monitoring for Kubernetes SD:
```yaml groups: - name: kubernetes_sd_alerts rules: - alert: KubernetesSDFailed expr: increase(prometheus_sd_kubernetes_refresh_failures_total[5m]) > 0 for: 5m labels: severity: critical annotations: summary: "Kubernetes service discovery failed" description: "{{ $value }} SD refresh failures"
- alert: KubernetesAPIServerDown
- expr: up{job="kubernetes-apiservers"} == 0
- for: 5m
- labels:
- severity: critical
- annotations:
- summary: "Kubernetes API server unreachable"
- alert: KubernetesTargetsMissing
- expr: count(up{job=~"kubernetes-.*"}) < 10
- for: 10m
- labels:
- severity: warning
- annotations:
- summary: "Few Kubernetes targets discovered"
- description: "Only {{ $value }} Kubernetes targets found"
- alert: PodNotScraped
- expr: absent(up{job="kubernetes-pods",pod="critical-pod"})
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "Critical pod not being scraped"
`
Complete Configuration Example
```yaml # prometheus.yml with full Kubernetes SD scrape_configs: - job_name: 'kubernetes-apiservers' kubernetes_sd_configs: - role: endpoints scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
- kubernetes_sd_configs:
- - role: node
- scheme: https
- tls_config:
- ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- relabel_configs:
- - action: labelmap
- regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-pods'
- kubernetes_sd_configs:
- - role: pod
- relabel_configs:
- - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
- action: keep
- regex: true
- - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
- action: replace
- target_label: __metrics_path__
- regex: (.+)
- - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
- action: replace
- regex: ([^:]+)(?::\d+)?;(\d+)
- replacement: ${1}:${2}
- target_label: __address__
- - action: labelmap
- regex: __meta_kubernetes_pod_label_(.+)
- - source_labels: [__meta_kubernetes_namespace]
- action: replace
- target_label: kubernetes_namespace
- - source_labels: [__meta_kubernetes_pod_name]
- action: replace
- target_label: kubernetes_pod_name
`