Introduction

DNS resolution failures occur when domain names cannot be translated to IP addresses, preventing applications from connecting to services. DNS (Domain Name System) is a hierarchical, distributed database that resolves human-readable domain names to machine-readable IP addresses. Common causes include nameserver misconfiguration, DNS propagation delays after changes, expired or suspended domains, DNSSEC validation failures, local resolver cache poisoning, firewall blocking DNS traffic (port 53), incorrect /etc/resolv.conf configuration, split-horizon DNS misconfiguration, CNAME chain resolution failures, PTR record issues for reverse lookups, and corporate DNS filtering or interception. The fix requires understanding DNS architecture, record types, resolution flow, caching behavior, and debugging tools. This guide provides production-proven troubleshooting for DNS issues across local systems, corporate networks, and cloud deployments.

Symptoms

  • nslookup: can't resolve 'example.com'
  • dig: connection timed out; no servers could be reached
  • getaddrinfo: Temporary failure in name resolution
  • curl: (6) Could not resolve host: example.com
  • ping: cannot resolve example.com: Unknown host
  • Application errors: UnknownHostException, NameResolutionError
  • Intermittent resolution: works sometimes, fails other times
  • Slow DNS resolution (> 5 seconds)
  • Wrong IP returned for domain (stale cache)
  • SERVFAIL or NXDOMAIN from dig/nslookup
  • DNS works on WiFi but not cellular (or vice versa)
  • Internal domains not resolving from external network

Common Causes

  • Upstream DNS server unreachable or not responding
  • Local DNS cache contains stale/incorrect records
  • Domain registration expired or suspended
  • Nameserver records (NS) pointing to wrong servers
  • DNS zone not loaded or zone file syntax error
  • DNSSEC signatures expired or validation fails
  • Firewall blocking UDP/TCP port 53
  • Split DNS: internal domain not resolvable externally
  • CNAME chain too long or circular CNAME references
  • TTL too high causing slow propagation after changes
  • IPv6 DNS (AAAA) failing while IPv4 works (or vice versa)
  • Corporate DNS proxy intercepting and blocking queries
  • /etc/resolv.conf has wrong/missing nameserver entries
  • systemd-resolved or dnsmasq service not running

Step-by-Step Fix

### 1. Diagnose DNS resolution issues

Test basic resolution:

```bash # Quick resolution test nslookup example.com

# Output: # Server: 8.8.8.8 # Address: 8.8.8.8#53 # # Non-authoritative answer: # Name: example.com # Address: 93.184.216.34

# More detailed query with dig dig example.com

# Output sections: # ;; QUESTION SECTION: # ;example.com. IN A # # ;; ANSWER SECTION: # example.com. 3600 IN A 93.184.216.34 # # ;; AUTHORITY SECTION: # example.com. 3600 IN NS a.iana-servers.net. # example.com. 3600 IN NS b.iana-servers.net. # # ;; ADDITIONAL SECTION: # a.iana-servers.net. 3600 IN A 192.0.34.43 # # ;; Query time: 25 msec # ;; SERVER: 8.8.8.8#53(8.8.8.8) # ;; WHEN: Wed Apr 01 12:00:00 UTC 2026 # ;; MSG SIZE rcvd: 123

# Check specific record type dig example.com MX # Mail exchange dig example.com TXT # Text records (SPF, DKIM, verification) dig example.com AAAA # IPv6 address dig example.com NS # Nameservers dig example.com SOA # Start of authority dig example.com CNAME # Canonical name

# Trace DNS resolution path dig +trace example.com

# Shows full resolution chain: # Root servers -> TLD servers -> Authoritative nameservers ```

Check DNS server connectivity:

```bash # Test multiple DNS servers for dns in 8.8.8.8 1.1.1.1 9.9.9.9 208.67.222.222; do echo "Testing $dns:" dig @$dns example.com +short done

# Check local resolver configuration cat /etc/resolv.conf

# Typical output: # # Generated by NetworkManager # nameserver 192.168.1.1 # nameserver 8.8.8.8 # search localdomain

# Check systemd-resolved status (systemd systems) systemctl status systemd-resolved resolvectl status

# Output shows: # Global # Protocols: +LLMNR +mDNS +DNSOverTLS DNSSEC=no/unsupported # Current DNS Server: 8.8.8.8 # DNS Servers: 8.8.8.8 8.8.4.4 ```

Check DNS cache:

```bash # Flush DNS cache # Linux (systemd-resolved) resolvectl flush-caches # Or systemctl restart systemd-resolved

# Linux (dnsmasq) systemctl restart dnsmasq

# macOS sudo dscacheutil -flushcache sudo killall -HUP mDNSResponder

# Windows ipconfig /flushdns

# Check if cache has stale entry # dig should show non-cached query time dig example.com | grep "Query time" # First query: ~50ms (cache miss) # Second query: ~0ms (cache hit)

# If always slow, cache may not be working # If wrong IP, cache may be poisoned ```

### 2. Fix nameserver configuration

Check domain nameservers:

```bash # Check what nameservers are registered for domain whois example.com | grep -A5 "Name Server"

# Or use whois command whois example.com

# Check from DNS perspective dig example.com NS

# Verify nameservers are responding for ns in $(dig example.com NS +short); do echo "Checking $ns:" dig @$ns example.com A +short done

# Nameservers should all return consistent answers # If one differs, zone transfer may not have completed

# Check if nameservers are properly delegated # From root servers dig +trace example.com | grep -E "example.com.*IN.*NS" ```

Fix nameserver configuration:

```bash # At domain registrar (GoDaddy, Namecheap, Cloudflare, etc.): # 1. Login to registrar control panel # 2. Find domain management # 3. Update nameservers to: # - ns1.your-provider.com # - ns2.your-provider.com

# Or use custom nameservers: # - ns1.example.com (must have glue record) # - ns2.example.com

# After changing nameservers, propagation takes 24-48 hours # Check propagation status # https://whatsmydns.net

# Verify zone loaded on nameserver # On authoritative nameserver: dig @localhost example.com SOA

# Check BIND zone status rndc status rndc retransfer example.com

# Check zone file syntax (BIND) named-checkzone example.com /etc/bind/zones/example.com.zone ```

Fix /etc/resolv.conf:

```bash # Temporary fix (overwritten by NetworkManager) echo "nameserver 8.8.8.8" > /etc/resolv.conf echo "nameserver 8.8.4.4" >> /etc/resolv.conf

# Permanent fix via NetworkManager nmcli connection modify eth0 ipv4.dns "8.8.8.8 8.8.4.4" nmcli connection up eth0

# Or edit connection file # /etc/NetworkManager/system-connections/eth0.nmconnection [ipv4] dns=8.8.8.8;8.8.4.4; dns-priority=100

# For systemd-resolved # /etc/systemd/resolved.conf [Resolve] DNS=8.8.8.8 8.8.4.4 FallbackDNS=1.1.1.1 9.9.9.9

systemctl restart systemd-resolved

# For static network configuration # /etc/network/interfaces (Debian) auto eth0 iface eth0 inet static dns-nameservers 8.8.8.8 8.8.4.4 dns-search example.com ```

### 3. Fix DNS propagation issues

Understand propagation delays:

```bash # After DNS changes, propagation depends on: # 1. TTL (Time To Live) of old records # 2. ISP resolver cache refresh intervals # 3. Registry propagation (NS changes)

# Check current TTL dig example.com | grep -E "example.com.*IN"

# Output: # example.com. 3600 IN A 93.184.216.34 # ^^^^ # TTL in seconds (3600 = 1 hour)

# Before making changes, reduce TTL # Change TTL to 300 (5 minutes) at least 24 hours before # Then make the actual change # After propagation, can increase TTL again

# Check propagation from different locations # Using public DNS resolvers: dig @8.8.8.8 example.com +short # Google dig @1.1.1.1 example.com +short # Cloudflare dig @9.9.9.9 example.com +short # Quad9 dig @208.67.222.222 example.com +short # OpenDNS

# Or use online tools: # https://whatsmydns.net # https://dnspropagation.net ```

Speed up propagation:

```bash # For critical changes, use low TTL beforehand # Update zone file: $TTL 300 # 5 minutes example.com. IN A 1.2.3.4

# Or specific record: www.example.com. IN A 300 1.2.3.4

# After changes propagate, increase TTL for efficiency $TTL 3600 # 1 hour (or higher for stable records)

# Force refresh on local resolver # Restart resolver service systemctl restart systemd-resolved

# Or clear specific cache entry # (depends on resolver implementation)

# For applications with DNS cache # Java: -Dsun.net.inetaddr.ttl=0 # Node.js: dns.setDefaultResultOrder('verbatim') ```

### 4. Fix DNSSEC validation issues

Check DNSSEC status:

```bash # Check if DNSSEC is enabled for domain dig example.com DNSKEY +dnssec

# Check DS record at parent dig example.com DS

# Output if DNSSEC enabled: # example.com. 3600 IN DS 12345 13 2 ABCDEF...

# Check validation status dig +dnssec example.com

# Look for: # ;; flags: qr rd ra ad; (ad = authenticated data, DNSSEC validated) # If ad flag missing, validation failed or not enabled

# Test DNSSEC validation dig @8.8.8.8 dnssec-failed.org | grep flags # Should show SERVFAIL if validator working ```

Fix DNSSEC validation failures:

```bash # If domain has DNSSEC but resolver can't validate: # Option 1: Use DNSSEC-capable resolver # Edit /etc/resolv.conf nameserver 8.8.8.8 # Supports DNSSEC nameserver 1.1.1.1 # Supports DNSSEC

# Option 2: Disable DNSSEC validation (not recommended) # /etc/systemd/resolved.conf [Resolve] DNSSEC=no

systemctl restart systemd-resolved

# Option 3: Fix DNSSEC at domain # At registrar, update DS record to match key # Or remove DNSSEC if not needed

# Remove DNSSEC delegation: # 1. Remove DS record at registrar # 2. Wait for parent zone to update # 3. Can then disable DNSSEC on domain

# Generate new DNSSEC keys (BIND) dnssec-keygen -a ECDSAP256SHA256 example.com dnssec-signzone -s now -N increment example.com ```

### 5. Fix split-horizon DNS issues

Configure split DNS:

```bash # Split DNS: Different answers for internal vs external clients

# BIND configuration: # /etc/bind/named.conf.local

# Internal view (LAN clients) view "internal" { match-clients { 192.168.0.0/16; 10.0.0.0/8; };

zone "example.com" { type master; file "/etc/bind/zones/example.com.internal"; };

# Forward external queries zone "." { type forward; forwarders { 8.8.8.8; 8.8.4.4; }; }; };

# External view (everyone else) view "external" { match-clients { any; };

zone "example.com" { type master; file "/etc/bind/zones/example.com.external"; }; };

# Internal zone file has private IPs # External zone file has public IPs ```

Test split DNS:

```bash # From internal network dig @internal-dns.example.com app.example.com # Should return: 192.168.1.100

# From external network dig @external-dns.example.com app.example.com # Should return: 203.0.113.100

# If wrong answer, check: # 1. match-clients ACL # 2. View order in named.conf # 3. Firewall rules allowing DNS ```

### 6. Fix firewall DNS blocking

Check firewall rules:

```bash # Check if DNS port is blocked # UDP 53 (standard DNS) # TCP 53 (zone transfers, large responses)

# Test connectivity nc -zv dns-server 53 nc -zuv dns-server 53

# Check iptables rules iptables -L -n | grep 53

# Allow DNS traffic iptables -A INPUT -p udp --dport 53 -j ACCEPT iptables -A INPUT -p tcp --dport 53 -j ACCEPT iptables -A OUTPUT -p udp --sport 53 -j ACCEPT iptables -A OUTPUT -p tcp --sport 53 -j ACCEPT

# For UFW ufw allow out 53/udp ufw allow out 53/tcp

# For firewalld firewall-cmd --add-service=dns --permanent firewall-cmd --reload

# Cloud security groups (AWS, GCP, Azure) # Ensure egress rule allows port 53 ```

### 7. Fix CNAME chain issues

Diagnose CNAME problems:

```bash # Check CNAME chain dig www.example.com +trace

# Look for: # www.example.com. CNAME cdn.example.com. # cdn.example.com. CNAME cloudfront.net. # cloudfront.net. A 13.32.0.1

# Problems to look for: # 1. Circular CNAME: A -> B -> A # 2. CNAME to non-existent domain # 3. CNAME chain too long (max 8 per RFC) # 4. CNAME coexists with other records (not allowed)

# Check for CNAME at apex (invalid) dig example.com CNAME # If returns CNAME, this is invalid for apex domain # Use ALIAS or ANAME record instead (provider-specific)

# Fix CNAME chain # In zone file, replace chain with direct A record # Or reduce chain length ```

Fix CNAME configuration:

```bash # Correct CNAME usage: # www.example.com. IN CNAME example.com. # example.com. IN A 1.2.3.4

# Incorrect (CNAME with other records): # example.com. IN CNAME other.com. # example.com. IN MX mail.example.com. # NOT ALLOWED!

# Fix: Use A record instead # example.com. IN A 1.2.3.4 # www.example.com. IN CNAME example.com. # example.com. IN MX 10 mail.example.com.

# For apex domain aliasing, use provider-specific: # Cloudflare: CNAME flattening # AWS Route53: Alias records # DNSimple: ANAME records ```

### 8. Monitor DNS health

DNS monitoring queries:

```bash # Check response time dig example.com | grep "Query time"

# Check all nameservers respond for ns in $(dig example.com NS +short); do time dig @$ns example.com +short done

# Check for consistent answers dig @ns1.example.com example.com +short dig @ns2.example.com example.com +short # Should return same IP ```

Prometheus monitoring:

```yaml # Blackbox exporter for DNS monitoring scrape_configs: - job_name: 'dns-check' metrics_path: /probe params: module: [dns_tcp] static_configs: - targets: - 8.8.8.8 - 1.1.1.1 relabel_configs: - source_labels: [__address__] target_label: __param_target - target_label: instance replacement: dns-resolver - target_label: __address__ replacement: blackbox-exporter:9115

# Key metrics: # probe_dns_lookup_time_seconds # probe_success # probe_ip_protocol

# Grafana alert rules groups: - name: dns_health rules: - alert: DNSResolutionFailed expr: probe_success{job="dns-check"} == 0 for: 5m labels: severity: critical annotations: summary: "DNS resolution failing" description: "Cannot resolve DNS target {{ $labels.instance }}"

  • alert: DNSResponseSlow
  • expr: probe_dns_lookup_time_seconds > 1
  • for: 10m
  • labels:
  • severity: warning
  • annotations:
  • summary: "DNS resolution slow"
  • description: "DNS lookup taking {{ $value }}s"
  • alert: DNSNameserverDown
  • expr: probe_success{target="8.8.8.8"} == 0
  • for: 5m
  • labels:
  • severity: critical
  • annotations:
  • summary: "Primary DNS nameserver down"
  • `

Health check script:

```bash #!/bin/bash # DNS health check script

check_dns() { local domain=$1 local dns_server=$2

result=$(dig @$dns_server $domain +time=2 +tries=1 +short 2>/dev/null)

if [ -z "$result" ]; then echo "FAIL: $domain via $dns_server" return 1 else echo "OK: $domain via $dns_server -> $result" return 0 fi }

# Test multiple domains and resolvers check_dns google.com 8.8.8.8 check_dns google.com 1.1.1.1 check_dns example.com 8.8.8.8 check_dns internal.example.com 192.168.1.1

# Check response times for dns in 8.8.8.8 1.1.1.1; do time=$(dig @$dns google.com | grep "Query time" | awk '{print $4}') echo "$dns: ${time}ms" done ```

Prevention

  • Use multiple DNS resolvers for redundancy
  • Set appropriate TTL values (low before changes, high for stability)
  • Monitor DNS resolution time and failure rates
  • Document DNS architecture and record inventory
  • Test DNS changes in staging before production
  • Use DNSSEC for domain authenticity
  • Implement split DNS for internal/external resolution
  • Configure DNS failover (secondary nameservers)
  • Regular audit of DNS records for stale entries
  • Set up alerts for DNS propagation issues
  • **NXDOMAIN**: Domain does not exist
  • **SERVFAIL**: Server failure (DNSSEC validation, zone issue)
  • **REFUSED**: Server refused query (ACL, firewall)
  • **FORMERR**: Format error (malformed query)
  • **NOTIMP**: Not implemented (query type not supported)