What's Actually Happening

DNS resolution intermittently times out, causing applications to fail connecting to services. Some queries succeed while others fail without a clear pattern, affecting system reliability.

The Error You'll See

Intermittent DNS failure:

```bash $ curl https://example.com

curl: (6) Could not resolve host: example.com

# Try again: $ curl https://example.com # Works fine ```

nslookup timeout:

```bash $ nslookup example.com

;; connection timed out; no servers could be reached

# Second attempt: $ nslookup example.com Server: 8.8.8.8 Address: 8.8.8.8#53

Non-authoritative answer: Name: example.com Address: 93.184.216.34 ```

Application errors:

bash
# In application logs:
java.net.UnknownHostException: api.example.com: Name or service not known
# or
getaddrinfo failed: Temporary failure in name resolution

Why This Happens

  1. 1.DNS server overload - Server cannot handle query volume
  2. 2.Network packet loss - UDP packets dropped
  3. 3.Firewall rules - Intermittently blocking DNS
  4. 4.DNS server unreachable - Temporary connectivity issues
  5. 5.Resolver cache issues - Local resolver problems
  6. 6.Multiple DNS servers misconfigured - Failover not working

Step 1: Check Current DNS Configuration

```bash # Check resolver configuration cat /etc/resolv.conf

# Check for multiple nameservers grep nameserver /etc/resolv.conf

# Check search domains grep search /etc/resolv.conf

# Check DNS options grep options /etc/resolv.conf

# Check systemd-resolved status systemctl status systemd-resolved

# Check NetworkManager DNS nmcli dev show | grep DNS

# Check if using local DNS cache ps aux | grep -E "dnsmasq|unbound|systemd-resolve" ```

Step 2: Test DNS Resolution

```bash # Test with dig (shows timing) dig example.com

# Test specific DNS server dig @8.8.8.8 example.com

# Test with timing info dig +stats example.com

# Query statistics: ;; Query time: 50 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Wed Apr 16 00:59:00 UTC 2026 ;; MSG SIZE rcvd: 56

# Test with nslookup nslookup example.com 8.8.8.8

# Test with host host example.com 8.8.8.8

# Test UDP connectivity nc -uzv 8.8.8.8 53

# Test TCP DNS (for large responses) dig +tcp @8.8.8.8 example.com ```

Step 3: Run Continuous DNS Tests

```bash # Continuous DNS monitoring while true; do echo "$(date): $(dig +short @8.8.8.8 example.com 2>&1)" sleep 1 done

# Track success/failure rate cat << 'EOF' > dns_test.sh #!/bin/bash SERVER=8.8.8.8 DOMAIN=example.com COUNT=100 SUCCESS=0 FAIL=0

for i in $(seq 1 $COUNT); do RESULT=$(dig +short +time=2 @$SERVER $DOMAIN 2>&1) if [ -n "$RESULT" ]; then ((SUCCESS++)) echo -n "." else ((FAIL++)) echo -n "X" fi done

echo "" echo "Success: $SUCCESS/$COUNT" echo "Failed: $FAIL/$COUNT" echo "Success rate: $(echo "scale=2; $SUCCESS*100/$COUNT" | bc)%" EOF

chmod +x dns_test.sh ./dns_test.sh

# Test all configured nameservers for ns in $(grep nameserver /etc/resolv.conf | awk '{print $2}'); do echo "Testing $ns..." for i in {1..10}; do dig +short +time=2 @$ns google.com >/dev/null && echo -n "." || echo -n "X" done echo "" done ```

Step 4: Check DNS Server Health

```bash # Check if DNS servers are responding for ns in 8.8.8.8 8.8.4.4; do echo "Testing $ns:" ping -c 5 $ns echo "" done

# Check DNS server latency for ns in 8.8.8.8 8.8.4.4; do echo "Latency to $ns:" for i in {1..10}; do time dig +short @$ns example.com >/dev/null 2>&1 done done

# Check if using IPv6 DNS (may have issues) dig @2001:4860:4860::8888 example.com

# Test public vs private DNS time dig @8.8.8.8 example.com time dig @192.168.1.1 example.com # Local DNS ```

Step 5: Check Network for Packet Loss

```bash # Check for packet loss to DNS servers mtr -c 100 -r 8.8.8.8

# Check interface errors ip -s link show eth0

# Look for: # RX errors, dropped, overruns # TX errors, carrier

# Check for UDP packet drops netstat -su | grep -A 5 Udp

# Check for dropped packets cat /proc/net/snmp | grep Udp

# Check firewall drops iptables -L -n -v | grep -i drop

# Monitor packet loss in real-time watch -n 1 'netstat -su | grep -E "packet receive errors|buffer errors"' ```

Step 6: Configure DNS Timeouts and Retries

```bash # Edit /etc/resolv.conf to add options:

# Increase timeout (default 5 seconds per attempt) options timeout:10

# Increase attempts (default 2) options attempts:5

# Combine options options timeout:10 attempts:5

# Rotate between nameservers options rotate

# Use EDNS0 for larger responses options edns0

# Single-request (avoid parallel queries) options single-request

# Single-request-reopen (new socket per query) options single-request-reopen

# Complete example: nameserver 8.8.8.8 nameserver 8.8.4.4 nameserver 1.1.1.1 options timeout:5 attempts:3 rotate

# For systemd-resolved, edit: # /etc/systemd/resolved.conf [Resolve] DNS=8.8.8.8 8.8.4.4 1.1.1.1 FallbackDNS=1.0.0.1 DNSStubListener=yes ReadEtcHosts=yes

# Restart systemd-resolved systemctl restart systemd-resolved ```

Step 7: Add Backup DNS Servers

```bash # Add multiple DNS servers for redundancy

# In /etc/resolv.conf: nameserver 8.8.8.8 # Google primary nameserver 8.8.4.4 # Google secondary nameserver 1.1.1.1 # Cloudflare primary nameserver 1.0.0.1 # Cloudflare secondary

# With rotation for load balancing: options rotate timeout:2 attempts:2

# For NetworkManager: nmcli con mod eth0 ipv4.dns "8.8.8.8 8.8.4.4 1.1.1.1" nmcli con up eth0

# Verify changes cat /etc/resolv.conf ```

Step 8: Set Up Local DNS Cache

```bash # Install dnsmasq for local caching apt-get install dnsmasq # Ubuntu/Debian yum install dnsmasq # CentOS/RHEL

# Configure dnsmasq cat << 'EOF' > /etc/dnsmasq.conf # Use upstream DNS servers server=8.8.8.8 server=8.8.4.4 server=1.1.1.1

# Cache size cache-size=1000

# Cache TTL min-cache-ttl=60 max-cache-ttl=3600

# Listen on localhost listen-address=127.0.0.1 bind-interfaces

# Log queries (for debugging) # log-queries EOF

# Start dnsmasq systemctl enable dnsmasq systemctl start dnsmasq

# Update resolv.conf to use local cache echo "nameserver 127.0.0.1" > /etc/resolv.conf echo "options timeout:2 attempts:2" >> /etc/resolv.conf

# Test cache dig @127.0.0.1 example.com # First query: ~50ms dig @127.0.0.1 example.com # Second query: ~1ms (cached) ```

Step 9: Check Application DNS Settings

```bash # Java applications # Add to JVM options: -Dsun.net.inetaddr.ttl=30 -Dsun.net.inetaddr.negative.ttl=10

# Or in java.security: networkaddress.cache.ttl=30 networkaddress.cache.negative.ttl=10

# Python applications # Use dnspython for explicit DNS: import dns.resolver resolver = dns.resolver.Resolver() resolver.nameservers = ['8.8.8.8', '1.1.1.1'] resolver.timeout = 5 resolver.lifetime = 10 answer = resolver.resolve('example.com', 'A')

# Node.js applications # Set DNS servers: const dns = require('dns'); dns.setServers(['8.8.8.8', '8.8.4.4']);

# glibc resolver options # In /etc/resolv.conf: options timeout:5 attempts:3 rotate ```

Step 10: Monitor DNS Resolution

```bash # Create monitoring script cat << 'EOF' > monitor_dns.sh #!/bin/bash

echo "=== DNS Configuration ===" cat /etc/resolv.conf

echo "" echo "=== DNS Server Latency ===" for ns in $(grep nameserver /etc/resolv.conf | awk '{print $2}'); do echo -n "$ns: " avg=$(for i in {1..5}; do dig +short @$ns google.com 2>/dev/null | head -1 && echo "ok" || echo "fail"; done | grep -c ok) echo "$avg/5 successful" done

echo "" echo "=== Recent DNS Failures ===" journalctl -u systemd-resolved --since "1 hour ago" | grep -i "fail|error|timeout" | tail -10

echo "" echo "=== DNS Cache Stats (dnsmasq) ===" if pgrep dnsmasq >/dev/null; then echo "Cache size: $(sudo dnsmasq --dump-cache 2>/dev/null | wc -l)" sudo kill -USR1 $(cat /var/run/dnsmasq/dnsmasq.pid) fi

echo "" echo "=== UDP Statistics ===" cat /proc/net/snmp | grep Udp EOF

chmod +x monitor_dns.sh

# Set up continuous monitoring watch -n 10 ./monitor_dns.sh

# Prometheus node exporter DNS monitoring # Add to prometheus alerts: - alert: DNSResolutionFailure expr: probe_success{job="dns_probe"} == 0 for: 2m labels: severity: critical annotations: summary: "DNS resolution failing" EOF ```

DNS Resolution Timeout Checklist

CheckCommandExpected
Resolver configcat /etc/resolv.confValid nameservers
DNS responsedig @server domainReturns IP
Latencydig +stats< 100ms typical
Packet lossmtr dns-server< 1%
Multiple serversgrep nameserver2+ servers
Timeout optionsgrep optionstimeout >= 5
Local cacheps aux \grep dnsmasqRunning

Verify the Fix

```bash # After adjusting DNS configuration

# 1. Test consistent resolution for i in {1..20}; do dig +short google.com; done // All queries return same IP

# 2. Measure response times for i in {1..20}; do time dig +short google.com; done 2>&1 | grep real // Consistent low latency

# 3. Test all configured servers for ns in $(grep nameserver /etc/resolv.conf | awk '{print $2}'); do echo "Testing $ns:" for i in {1..5}; do dig +short @$ns google.com || echo "FAIL"; done done // All servers respond

# 4. Monitor for failures timeout 60 bash -c 'while true; do dig +short google.com || echo "FAIL at $(date)"; sleep 1; done' // No failures for 60 seconds

# 5. Check application connectivity curl -I https://google.com // Connects successfully ```

  • [Fix DNS Resolution Failed](/articles/fix-dns-resolution-failed)
  • [Fix DNS Server Not Responding](/articles/fix-dns-server-not-responding)
  • [Fix DNS Propagation Delay](/articles/fix-dns-propagation-delay)