Introduction DNS round robin load balancing relies on DNS records being updated when backends change. If DNS records are stale, clients are directed to non-existent servers.
Symptoms - Some clients connecting to decommissioned servers - DNS returning IPs that no longer respond - Intermittent connection failures based on DNS cache - Server replacement causing temporary outage - Clients not respecting DNS TTL
Common Causes - DNS TTL too long (clients cache stale records) - DNS record not updated when server is removed - Recursive DNS servers ignoring TTL - Health check not integrated with DNS updates - Multiple DNS providers with inconsistent records
Step-by-Step Fix 1. **Check current DNS TTL': ```bash dig +noall +answer example.com # Check TTL value (second number) ```
- 1.**Reduce TTL before server changes':
- 2.```bash
- 3.# Reduce TTL to 60 seconds at least 24 hours before changes
- 4.aws route53 change-resource-record-sets \
- 5.--hosted-zone-id <zone-id> \
- 6.--change-batch file://update-ttl.json
- 7.
` - 8.**Implement health-check-based DNS failover':
- 9.```bash
- 10.aws route53 create-health-check \
- 11.--caller-reference "check-$(date +%s)" \
- 12.--health-check-config Type=HTTP,Port=80,FullyQualifiedDomainName=example.com
- 13.
`