What's Actually Happening
Kubernetes pods cannot resolve DNS names. Services cannot be discovered by name, breaking cluster-internal communication and external connectivity.
The Error You'll See
DNS resolution failure:
```bash $ kubectl exec mypod -- nslookup kubernetes
;; connection timed out; no servers could be reached ```
Service discovery failed:
```bash $ kubectl exec mypod -- curl http://api-service:8080
curl: (6) Could not resolve host: api-service ```
CoreDNS error:
```bash $ kubectl logs -n kube-system coredns-xxx
[ERROR] plugin/errors: 2 kubernetes.default.svc.cluster.local. A: dns: no such host ```
Why This Happens
- 1.CoreDNS down - CoreDNS pods not running
- 2.ConfigMap misconfigured - CoreDNS configuration errors
- 3.Network policies - Blocking DNS traffic
- 4.resolv.conf wrong - Incorrect DNS configuration in pods
- 5.Node DNS issues - Host DNS not working
- 6.Cluster IP conflict - DNS service IP conflict
Step 1: Check CoreDNS Status
```bash # Check CoreDNS pods: kubectl get pods -n kube-system -l k8s-app=kube-dns
# Should show Running state
# Check CoreDNS service: kubectl get svc -n kube-system kube-dns
# Check endpoints: kubectl get endpoints -n kube-system kube-dns
# Check CoreDNS logs: kubectl logs -n kube-system -l k8s-app=kube-dns
# Check CoreDNS metrics: kubectl exec -n kube-system coredns-xxx -- curl localhost:9153/metrics ```
Step 2: Test DNS Resolution
```bash # Test DNS from pod: kubectl exec mypod -- nslookup kubernetes kubectl exec mypod -- nslookup kubernetes.default.svc.cluster.local
# Test external DNS: kubectl exec mypod -- nslookup google.com
# Test with dig: kubectl exec mypod -- dig @10.96.0.10 kubernetes.default.svc.cluster.local
# Test with curl: kubectl exec mypod -- curl -v http://kubernetes/api/v1
# Create test pod: kubectl run dns-test --rm -it --image=busybox -- nslookup kubernetes
# Check resolv.conf in pod: kubectl exec mypod -- cat /etc/resolv.conf
# Expected: # nameserver 10.96.0.10 # search default.svc.cluster.local svc.cluster.local cluster.local # options ndots:5 ```
Step 3: Check CoreDNS Configuration
```bash # Get CoreDNS ConfigMap: kubectl get configmap coredns -n kube-system -o yaml
# Corefile should have: apiVersion: v1 kind: ConfigMap metadata: name: coredns namespace: kube-system data: Corefile: | .:53 { errors health { lameduck 5s } ready kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa ttl 30 } prometheus :9153 forward . /etc/resolv.conf { max_concurrent 1000 } cache 30 loop reload loadbalance }
# Check for syntax errors: kubectl exec -n kube-system coredns-xxx -- coredns -conf /etc/coredns/Corefile -validate ```
Step 4: Check DNS Service IP
```bash # Check DNS service ClusterIP: kubectl get svc kube-dns -n kube-system -o jsonpath='{.spec.clusterIP}'
# Usually 10.96.0.10 or similar
# Verify no IP conflict: kubectl get svc -A -o wide | grep 10.96.0.10
# Check DNS service port: kubectl get svc kube-dns -n kube-system -o jsonpath='{.spec.ports}'
# Should be 53/UDP and 53/TCP
# Test connectivity to DNS service: kubectl run test --rm -it --image=busybox -- nc -zuv 10.96.0.10 53
# Check kubelet DNS configuration: ps aux | grep kubelet | grep cluster-dns ```
Step 5: Check Network Policies
```bash # List network policies: kubectl get networkpolicy -A
# Check if policy blocks DNS: kubectl describe networkpolicy -n myapp deny-all
# Allow DNS traffic: apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-dns spec: podSelector: {} policyTypes: - Egress egress: - to: - namespaceSelector: {} podSelector: matchLabels: k8s-app: kube-dns ports: - protocol: UDP port: 53 - protocol: TCP port: 53
# Apply policy: kubectl apply -f allow-dns.yaml ```
Step 6: Check Node DNS
```bash # Check node resolv.conf: cat /etc/resolv.conf
# Test DNS on node: nslookup google.com dig google.com
# Check if node can reach external DNS: ping 8.8.8.8
# Check node DNS service: systemctl status systemd-resolved
# Check if DNS ports blocked: iptables -L -n | grep 53
# Allow DNS: iptables -I INPUT -p udp --dport 53 -j ACCEPT iptables -I INPUT -p tcp --dport 53 -j ACCEPT
# Check for DNS cache issues: systemctl restart systemd-resolved ```
Step 7: Check Pod Networking
```bash # Check pod network: kubectl exec mypod -- ip route kubectl exec mypod -- ip addr
# Check pod can reach DNS service: kubectl exec mypod -- ping -c 3 10.96.0.10
# Check CNI plugin: ls /etc/cni/net.d/
# For Calico: kubectl get pods -n kube-system -l k8s-app=calico-node
# For Flannel: kubectl get pods -n kube-system -l app=flannel
# For Weave: kubectl get pods -n kube-system -l name=weave-net
# Check kube-proxy: kubectl get pods -n kube-system -l k8s-app=kube-proxy kubectl logs -n kube-system -l k8s-app=kube-proxy ```
Step 8: Check DNS Cache
```bash # CoreDNS cache statistics: kubectl exec -n kube-system coredns-xxx -- curl localhost:9153/metrics | grep coredns_cache
# Check cache hits/misses: # coredns_cache_hits_total # coredns_cache_misses_total
# Clear DNS cache (restart CoreDNS): kubectl rollout restart deployment coredns -n kube-system
# Check cache size: kubectl exec -n kube-system coredns-xxx -- curl localhost:9153/metrics | grep coredns_cache_size
# Adjust cache in Corefile: cache 30 { success 9984 30 denial 9984 5 } ```
Step 9: Fix Specific DNS Issues
```bash # Issue: External DNS not resolving # Fix: Check forward plugin in Corefile forward . /etc/resolv.conf { max_concurrent 1000 }
# Change to specific DNS servers: forward . 8.8.8.8 8.8.4.4 { max_concurrent 1000 }
# Issue: DNS loop detected # Fix: Check for loop in resolv.conf # Remove 127.0.0.1 from node resolv.conf
# Issue: DNS timeout # Fix: Increase timeout or check network forward . /etc/resolv.conf { max_concurrent 1000 policy sequential health_check 10s }
# Apply changes: kubectl rollout restart deployment coredns -n kube-system ```
Step 10: Monitor DNS Health
```bash # Create monitoring script: cat << 'EOF' > /usr/local/bin/monitor-k8s-dns.sh #!/bin/bash
echo "=== CoreDNS Pods ===" kubectl get pods -n kube-system -l k8s-app=kube-dns
echo "" echo "=== DNS Service ===" kubectl get svc -n kube-system kube-dns
echo "" echo "=== DNS Endpoints ===" kubectl get endpoints -n kube-system kube-dns
echo "" echo "=== Test Internal DNS ===" kubectl run dns-test --rm -it --image=busybox --restart=Never -- nslookup kubernetes 2>/dev/null
echo "" echo "=== Test External DNS ===" kubectl run dns-test-ext --rm -it --image=busybox --restart=Never -- nslookup google.com 2>/dev/null
echo "" echo "=== CoreDNS Metrics ===" kubectl exec -n kube-system $(kubectl get pods -n kube-system -l k8s-app=kube-dns -o jsonpath='{.items[0].metadata.name}') -- curl -s localhost:9153/metrics | grep -E "coredns_(cache|dns|forward)" EOF
chmod +x /usr/local/bin/monitor-k8s-dns.sh
# Prometheus metrics: # coredns_dns_request_duration_seconds # coredns_dns_requests_total # coredns_cache_hits_total # coredns_forward_request_duration_seconds
# Alerts: - alert: CoreDNSDown expr: kube_deployment_status_replicas_available{deployment="coredns", namespace="kube-system"} == 0 for: 2m labels: severity: critical annotations: summary: "CoreDNS deployment has no available replicas"
- alert: DNSResolutionFailing
- expr: rate(coredns_dns_response_rcode_count_total{rcode="SERVFAIL"}[5m]) > 0
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "CoreDNS returning SERVFAIL errors"
`
Kubernetes DNS Checklist
| Check | Command | Expected |
|---|---|---|
| CoreDNS pods | get pods | Running |
| DNS service | get svc kube-dns | ClusterIP assigned |
| Endpoints | get endpoints | Addresses listed |
| Internal DNS | nslookup kubernetes | Resolved |
| External DNS | nslookup google.com | Resolved |
| Network policy | get networkpolicy | Allows DNS |
Verify the Fix
```bash # After fixing DNS issues
# 1. Check CoreDNS running kubectl get pods -n kube-system -l k8s-app=kube-dns // All Running
# 2. Test internal resolution kubectl exec mypod -- nslookup kubernetes // Server: 10.96.0.10 // Address: 10.96.0.10#53 // Name: kubernetes.default.svc.cluster.local
# 3. Test external resolution kubectl exec mypod -- nslookup google.com // Resolved successfully
# 4. Test service discovery kubectl exec mypod -- curl http://api-service:8080/health // Connected
# 5. Check CoreDNS metrics kubectl exec -n kube-system coredns-xxx -- curl localhost:9153/metrics | grep coredns_dns_requests // Requests processed
# 6. Monitor ongoing /usr/local/bin/monitor-k8s-dns.sh // All tests pass ```
Related Issues
- [Fix DNS Resolution Timeout Randomly](/articles/fix-dns-resolution-timeout-randomly)
- [Fix Kubernetes Pod Not Ready](/articles/fix-kubernetes-pod-not-ready)
- [Fix Kubernetes Service Not Accessible](/articles/fix-kubernetes-service-not-accessible)