Fix DNS TTL Too Long Causing Delayed Failover After IP Change

Introduction

DNS TTL (Time To Live) controls how long resolvers cache a DNS record. A high TTL (e.g., 86400 seconds / 24 hours) reduces DNS query load but means that when you change an IP address, clients continue using the old cached IP for up to the TTL duration. During a failover event, this can extend downtime from minutes to hours because a significant portion of your users are still directed to the failed server.

Symptoms

After changing an A record, some users still reach the old IP
dig example.com shows the new IP but users report the site is still down
Failover completed but DNS propagation takes hours
dig example.com @8.8.8.8 shows old IP while @1.1.1.1 shows new IP
Monitoring shows the new server is healthy but user complaints continue

Common Causes

TTL set to 24 hours (86400) or more for A/CNAME records
TTL not reduced before a planned maintenance or migration
ISP resolvers ignoring TTL changes and caching for longer than specified
DNS provider not honoring low TTL settings for the record
Recursive resolvers implementing minimum TTL policies

Step-by-Step Fix

1.Check the current TTL for the record:
2.```bash
3.dig example.com A +noall +ttlid +answer
4.# Output: example.com. 86400 IN A 1.2.3.4
5.# The 86400 is the TTL in seconds (24 hours)
6.`
7.Reduce the TTL before making IP changes:
8.```bash
9.# Set TTL to 300 seconds (5 minutes) BEFORE the planned change
10.# This must be done at least one TTL period (old TTL) before the change
11.# In your DNS management console, change TTL from 86400 to 300
12.# Wait 24 hours (the old TTL) for all caches to expire
13.`
14.For emergency failover, lower TTL and update IP simultaneously:
15.```bash
16.# Change both TTL and IP in your DNS provider
17.# Even with high TTL, some resolvers will pick up the change sooner
18.# Use a DNS provider that supports low TTLs (< 60 seconds)
19.`
20.Force clients to use updated DNS:
21.```bash
22.# On affected client machines:
23.# Windows:
24.ipconfig /flushdns
25.# macOS:
26.sudo dscacheutil -flushcache
27.sudo killall -HUP mDNSResponder
28.# Linux (systemd-resolved):
29.sudo systemd-resolve --flush-caches
30.`
31.Use multiple A records for faster failover:
32.```bash
33.# Add multiple A records - clients try them in order
34.example.com. 300 IN A 1.2.3.4
35.example.com. 300 IN A 5.6.7.8
36.# If the first IP fails, some clients will try the second
37.`
38.Implement health-check-based DNS with your provider:
39.Many DNS providers (Cloudflare, Route53, DNSMadeEasy) offer health checks that automatically update DNS records when a server fails:
40.```bash
41.# AWS Route53 example with health check
42.aws route53 change-resource-record-sets \
43.--hosted-zone-id ZONEID \
44.--change-batch file://failover-config.json
45.`

Prevention

Set TTL to 300 seconds (5 minutes) for production A/CNAME records
Lower TTL to 60 seconds at least 24 hours before planned maintenance
Use DNS providers that support low TTLs and health-check-based failover
Implement global server load balancing (GSLB) for automatic geographic failover
Document the TTL change procedure as part of your incident response plan

Fix DNS TTL Too Long Causing Delayed Failover After IP Change

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Prevention

Share this guide

More DNS Troubleshooting Guides

Fix DNS RPZ Reaction NSDNAME Not Blocking Queries

Fix DNS DDNS GSS-TSIG Kerberos Authentication

Fix DNS SVCB HTTPS Record Service Binding

Fix DNS ZONEMD Record Integrity Validation Failing

Fix DNS CDS CDNSKEY Automatic DS Update Not Working

Fix DNS NAPTR Record ENUM Translation Failure