The Problem

Prometheus fails to discover Kubernetes pods, services, or endpoints, or discovered targets show errors:

bash
level=error ts=2026-04-04T20:30:15.234Z caller=kubernetes.go:234 msg="Kubernetes service discovery failed" err="failed to list pods: Unauthorized"
level=error ts=2026-04-04T20:30:16.345Z caller=kubernetes.go:235 msg="Failed to refresh kubernetes targets" err="Get \"https://kubernetes.default/api/v1/pods\": dial tcp: i/o timeout"
level=warn ts=2026-04-04T20:30:17.456Z caller=kubernetes.go:236 msg="No endpoints found" service="myapp-service"

Kubernetes service discovery errors prevent automatic target discovery, breaking pod and service monitoring.

Diagnosis

Check RBAC Permissions

```bash # Verify Prometheus service account kubectl get serviceaccount prometheus -n monitoring

# Check cluster role kubectl get clusterrole prometheus

# Check role binding kubectl get clusterrolebinding prometheus

# Test API access kubectl auth can-i list pods --as=system:serviceaccount:monitoring:prometheus kubectl auth can-i list endpoints --as=system:serviceaccount:monitoring:prometheus kubectl auth can-i list services --as=system:serviceaccount:monitoring:prometheus ```

Check Kubernetes API Connectivity

```bash # Test from Prometheus pod kubectl exec -it prometheus-pod -n monitoring -- curl -k https://kubernetes.default/api/v1/pods

# Check API server kubectl cluster-info

# Verify endpoints kubectl get endpoints -A | grep prometheus ```

Check Pod Annotations

```bash # Check pod annotations for prometheus scraping kubectl get pods -A -o jsonpath='{range .items[*]}{@.metadata.name}{"\t"}{@.metadata.annotations.prometheus\.io/scrape}{"\n"}{end}'

# Detailed check kubectl get pod myapp-pod -o yaml | grep -A5 annotations ```

Check Prometheus SD Configuration

```bash # View current config curl -s http://localhost:9090/api/v1/status/config | jq '.data.scrape_configs[] | select(.kubernetes_sd_configs)'

# Check discovered targets curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.discoveredLabels.__meta_kubernetes_namespace)' ```

Solutions

1. Fix RBAC Permissions

Missing or insufficient RBAC permissions:

yaml
# prometheus-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
  - apiGroups: [""]
    resources:
      - nodes
      - nodes/proxy
      - nodes/metrics
      - services
      - endpoints
      - pods
    verbs: ["get", "list", "watch"]
  - apiGroups: ["extensions", "networking.k8s.io"]
    resources:
      - ingresses
    verbs: ["get", "list", "watch"]
  - nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
    verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
  - kind: ServiceAccount
    name: prometheus
    namespace: monitoring

Apply RBAC:

bash
kubectl apply -f prometheus-rbac.yaml

2. Fix Kubernetes SD Configuration

Incorrect service discovery configuration:

```yaml # prometheus.yml scrape_configs: - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod namespaces: names: - monitoring - default - production relabel_configs: # Keep pods with prometheus.io/scrape annotation - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true

# Use annotation for metrics path - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+)

# Use annotation for port - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: ${1}:${2} target_label: __address__

# Add labels from pod metadata - action: labelmap regex: __meta_kubernetes_pod_label_(.+)

  • source_labels: [__meta_kubernetes_namespace]
  • action: replace
  • target_label: namespace
  • source_labels: [__meta_kubernetes_pod_name]
  • action: replace
  • target_label: pod
  • `

3. Fix Pod Annotations

Pods not properly annotated for scraping:

yaml
# Pod/deployment spec with annotations
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9090"
        prometheus.io/path: "/metrics"
    spec:
      containers:
        - name: myapp
          image: myapp:latest
          ports:
            - containerPort: 9090
              name: metrics

4. Fix Service Discovery Role

Incorrect role type for Kubernetes SD:

```yaml scrape_configs: # Pod discovery - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod

# Service discovery (for service-level monitoring) - job_name: 'kubernetes-services' kubernetes_sd_configs: - role: service

# Endpoints discovery - job_name: 'kubernetes-endpoints' kubernetes_sd_configs: - role: endpoints

# Node discovery - job_name: 'kubernetes-nodes' kubernetes_sd_configs: - role: node relabel_configs: - source_labels: [__address__] regex: '([^:]+):\d+' target_label: __address__ replacement: '${1}:9100' ```

5. Fix Namespace Filtering

Services/pods not in discovered namespaces:

yaml
scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          own_namespace: false
          names:
            - monitoring
            - production
            - staging
          # Or all namespaces
          # own_namespace: false

6. Handle API Server Connectivity

Cannot reach Kubernetes API:

yaml
scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
        # Use in-cluster config (default)
        # Or specify API server
        # api_server: https://kubernetes.default

From outside cluster:

yaml
scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
        api_server: https://kubernetes.example.com
        kubeconfig_file: /etc/prometheus/kubeconfig
        # Or use certificates directly
        tls_config:
          ca_file: /etc/prometheus/k8s/ca.crt
          cert_file: /etc/prometheus/k8s/client.crt
          key_file: /etc/prometheus/k8s/client.key

7. Fix Endpoint Discovery

Endpoints not properly discovered:

```yaml scrape_configs: - job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: # Keep only endpoints with prometheus.io/scrape annotation on service - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true

# Use annotation for path - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+)

# Use annotation for port - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: ${1}:${2} target_label: __address__ ```

Annotate services:

yaml
apiVersion: v1
kind: Service
metadata:
  name: myapp
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
    prometheus.io/path: "/metrics"
spec:
  ports:
    - port: 9090
      name: metrics

Verification

Check Discovered Targets

bash
# List Kubernetes discovered targets
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.discoveredLabels.__meta_kubernetes_pod_name) | {pod: .discoveredLabels.__meta_kubernetes_pod_name, namespace: .discoveredLabels.__meta_kubernetes_namespace}'

Verify RBAC Works

bash
# Test access from Prometheus pod
kubectl exec -it prometheus-pod -n monitoring -- sh -c 'curl -s -k -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://kubernetes.default/api/v1/namespaces/monitoring/pods | jq .items[].metadata.name'

Check Pods Are Scraped

```promql # Count scraped pods by namespace count by (namespace) (up{job="kubernetes-pods"})

# Pods not scraped count(up{job="kubernetes-pods"}) - count(up{job="kubernetes-pods"} == 1) ```

Prevention

Add monitoring for Kubernetes SD:

```yaml groups: - name: kubernetes_sd_alerts rules: - alert: KubernetesSDFailed expr: increase(prometheus_sd_kubernetes_refresh_failures_total[5m]) > 0 for: 5m labels: severity: critical annotations: summary: "Kubernetes service discovery failed" description: "{{ $value }} SD refresh failures"

  • alert: KubernetesAPIServerDown
  • expr: up{job="kubernetes-apiservers"} == 0
  • for: 5m
  • labels:
  • severity: critical
  • annotations:
  • summary: "Kubernetes API server unreachable"
  • alert: KubernetesTargetsMissing
  • expr: count(up{job=~"kubernetes-.*"}) < 10
  • for: 10m
  • labels:
  • severity: warning
  • annotations:
  • summary: "Few Kubernetes targets discovered"
  • description: "Only {{ $value }} Kubernetes targets found"
  • alert: PodNotScraped
  • expr: absent(up{job="kubernetes-pods",pod="critical-pod"})
  • for: 5m
  • labels:
  • severity: warning
  • annotations:
  • summary: "Critical pod not being scraped"
  • `

Complete Configuration Example

```yaml # prometheus.yml with full Kubernetes SD scrape_configs: - job_name: 'kubernetes-apiservers' kubernetes_sd_configs: - role: endpoints scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https

  • job_name: 'kubernetes-nodes'
  • kubernetes_sd_configs:
  • - role: node
  • scheme: https
  • tls_config:
  • ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  • bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  • relabel_configs:
  • - action: labelmap
  • regex: __meta_kubernetes_node_label_(.+)
  • job_name: 'kubernetes-pods'
  • kubernetes_sd_configs:
  • - role: pod
  • relabel_configs:
  • - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
  • action: keep
  • regex: true
  • - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
  • action: replace
  • target_label: __metrics_path__
  • regex: (.+)
  • - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
  • action: replace
  • regex: ([^:]+)(?::\d+)?;(\d+)
  • replacement: ${1}:${2}
  • target_label: __address__
  • - action: labelmap
  • regex: __meta_kubernetes_pod_label_(.+)
  • - source_labels: [__meta_kubernetes_namespace]
  • action: replace
  • target_label: kubernetes_namespace
  • - source_labels: [__meta_kubernetes_pod_name]
  • action: replace
  • target_label: kubernetes_pod_name
  • `