Introduction DNS round robin load balancing relies on DNS records being updated when backends change. If DNS records are stale, clients are directed to non-existent servers.

Symptoms - Some clients connecting to decommissioned servers - DNS returning IPs that no longer respond - Intermittent connection failures based on DNS cache - Server replacement causing temporary outage - Clients not respecting DNS TTL

Common Causes - DNS TTL too long (clients cache stale records) - DNS record not updated when server is removed - Recursive DNS servers ignoring TTL - Health check not integrated with DNS updates - Multiple DNS providers with inconsistent records

Step-by-Step Fix 1. **Check current DNS TTL': ```bash dig +noall +answer example.com # Check TTL value (second number) ```

  1. 1.**Reduce TTL before server changes':
  2. 2.```bash
  3. 3.# Reduce TTL to 60 seconds at least 24 hours before changes
  4. 4.aws route53 change-resource-record-sets \
  5. 5.--hosted-zone-id <zone-id> \
  6. 6.--change-batch file://update-ttl.json
  7. 7.`
  8. 8.**Implement health-check-based DNS failover':
  9. 9.```bash
  10. 10.aws route53 create-health-check \
  11. 11.--caller-reference "check-$(date +%s)" \
  12. 12.--health-check-config Type=HTTP,Port=80,FullyQualifiedDomainName=example.com
  13. 13.`

Prevention - Use low TTL (60-300s) for load balanced records - Implement DNS health checks for automatic failover - Use Route 53/GCP Cloud DNS with health check integration - Test DNS failover before production deployment - Monitor DNS resolution from multiple locations