Introduction
AWS Route 53 is Amazon's highly available DNS service with special features like Alias records that point to AWS resources, health checks for failover routing, and latency-based routing. These features create unique troubleshooting scenarios where standard DNS diagnosis doesn't apply. Route 53's global infrastructure typically propagates changes quickly, but certain configurations and edge cases can cause delays and failures.
Symptoms
- Route 53 changes not visible through resolvers
- Alias records returning unexpected IPs
- Health check-based routing showing wrong records
- Latency routing directing users to wrong region
- Private hosted zones not resolving for VPC instances
- Route 53 query logging shows unexpected patterns
- Domain registration DNS changes not taking effect
Common Causes
- Not waiting for Route 53's propagation window
- Alias records pointing to changed AWS resources
- Health check failing causing routing failover
- Private hosted zone not associated with VPC
- Routing policy configuration issues
- Reusable delegation sets not applied
- Domain registration DNS settings separate from Route 53 zone
Step-by-Step Fix
- 1.Check Route 53 authoritative servers directly.
```bash # Get Route 53 nameservers for your zone # AWS Console -> Route 53 -> Hosted Zones -> your zone -> NS record
# Or via CLI: aws route53 get-hosted-zone --id YOUR_ZONE_ID --query 'DelegationSet.NameServers' --output text
# Query Route 53 nameservers directly for ns in ns-123.awsdns-12.com ns-456.awsdns-45.net; do echo "=== $ns ===" dig @$ns example.com A +short done
# Route 53 uses 4 nameservers in different TLD domains (.com, .net, .org, .co.uk) # This provides resilience against TLD server failures
# Compare public resolver results: echo "Public resolver:" dig @8.8.8.8 example.com A +short ```
- 1.Understand Route 53's propagation behavior.
```bash # Route 53 propagation: # - Most changes propagate within 60 seconds # - Changes to NS records take longer (up to a few hours for TLD caching) # - Alias records update when underlying AWS resource changes
# Check change status via API: aws route53 get-change --id CHANGE_ID --query 'ChangeInfo.Status' --output text
# Status: PENDING -> INSYNC
# Route 53 normally changes to INSYNC within 60 seconds # But resolver caching adds latency
# Calculate total propagation time: # Route 53 sync: ~60 seconds # Resolver TTL cache: Your record's TTL # Total max wait: 60 seconds + TTL
# Check record TTL: aws route53 list-resource-record-sets --hosted-zone-id YOUR_ZONE_ID \ --query "ResourceRecordSets[?Name=='example.com.'].TTL" --output text ```
- 1.Debug Alias records pointing to AWS resources.
```bash # Alias records are special Route 53 feature # They point to AWS resources (ELB, S3, CloudFront) by DNS name, not IP # Route 53 resolves the target and returns IP directly
# Check Alias record configuration: aws route53 list-resource-record-sets --hosted-zone-id YOUR_ZONE_ID \ --query "ResourceRecordSets[?Type=='A' && AliasTarget]" --output json
# Alias target should be AWS resource DNS name: # Example: dualstack.my-elb.us-east-1.elb.amazonaws.com
# Test ELB/ALB target resolution: dig dualstack.my-elb.us-east-1.elb.amazonaws.com A +short
# If Alias record returns wrong IP: # 1. AWS resource may have changed IP # 2. Health check may have failed # 3. Routing policy may be active
# Test Alias directly: dig @ns-123.awsdns-12.com example.com A # Should return IPs from target AWS resource
# Common Alias targets: # ELB/ALB: dualstack.name.region.elb.amazonaws.com # CloudFront: d12345.cloudfront.net # S3 website: s3-website.region.amazonaws.com # API Gateway: d12345.execute-api.region.amazonaws.com ```
- 1.Check health check status affecting routing.
```bash # Route 53 health checks determine which records are returned # For failover, latency, and weighted routing policies
# List health checks: aws route53 list-health-checks --query 'HealthChecks[*].Id' --output text
# Get health check status: aws route53 get-health-check-status --health-check-id HEALTH_CHECK_ID
# Output shows: # HealthCheckObservations: # - Region (location of Route 53 checker) # - Status: Healthy or Unhealthy # - IP address of endpoint checked
# If health check is unhealthy: # - Failover routing switches to backup endpoint # - Weighted routing excludes this endpoint from responses # - Check why endpoint is unhealthy
# Test health check endpoint directly: health_check_ip=$(aws route53 get-health-check --health-check-id ID \ --query 'HealthCheck.HealthCheckConfig.IPAddress' --output text) health_check_port=$(aws route53 get-health-check --health-check-id ID \ --query 'HealthCheck.HealthCheckConfig.Port' --output text)
curl -I http://$health_check_ip:$health_check_port # Should return expected response ```
- 1.Resolve Private Hosted Zone issues.
```bash # Private hosted zones only resolve for associated VPCs
# Check VPC associations: aws route53 list-vpc-association-authorizations --hosted-zone-id YOUR_ZONE_ID
# Or: aws route53 get-hosted-zone --id YOUR_ZONE_ID --query 'VPCs'
# If VPC not listed, zone won't resolve for that VPC's instances
# Associate VPC with private zone: aws route53 associate-vpc-with-hosted-zone \ --hosted-zone-id YOUR_ZONE_ID \ --vpc VPCRegion=us-east-1,VPCId=vpc-12345
# Test resolution from VPC instance: # SSH into EC2 instance in VPC dig internal.example.com A +short
# Should return records from private zone
# Common issues: # - VPC not associated # - VPC in different account (needs authorization) # - Using public resolver instead of AmazonProvidedDNS ```
- 1.Verify routing policy configuration.
```bash # Route 53 routing policies: # - Simple: Single record # - Weighted: Multiple records with weights # - Latency: Region-based routing # - Failover: Primary/backup based on health # - Geolocation: Location-based routing # - Geoproximity: Distance-based routing (Traffic Flow)
# Check routing policy: aws route53 list-resource-record-sets --hosted-zone-id YOUR_ZONE_ID \ --query "ResourceRecordSets[?Name=='example.com.']" --output json
# For weighted routing, check weights: # Records should have different SetIdentifier and Weight values
# Test weighted routing distribution: for i in {1..20}; do dig @ns-123.awsdns-12.com example.com A +short | head -1 done | sort | uniq -c
# Latency routing - Route 53 returns closest endpoint to resolver # Test from different resolver locations: # Use online DNS checker to see results from different regions
# Geolocation routing - based on resolver's geographic location # Check geolocation record configuration: aws route53 list-resource-record-sets --hosted-zone-id YOUR_ZONE_ID \ --query "ResourceRecordSets[?GeoLocation]" --output json ```
- 1.Check reusable delegation set configuration.
```bash # Reusable delegation sets allow same nameservers for multiple zones
# Check delegation set: aws route53 list-reusable-delegation-sets
# Get zone's delegation set: aws route53 get-hosted-zone --id YOUR_ZONE_ID --query 'DelegationSet'
# If using custom delegation set: # Nameservers should be consistent across all zones using that set
# Test nameservers match: zone1_ns=$(aws route53 get-hosted-zone --id ZONE1 --query 'DelegationSet.NameServers' --output text) zone2_ns=$(aws route53 get-hosted-zone --id ZONE2 --query 'DelegationSet.NameServers' --output text)
if [ "$zone1_ns" = "$zone2_ns" ]; then echo "Delegation sets match" else echo "Different delegation sets" fi ```
- 1.Fix domain registration DNS settings.
```bash # Route 53 domain registration is separate from Route 53 hosted zones # Domain registration has its own DNS settings
# Check domain's registered nameservers: aws route53domains get-domain-detail --domain-name example.com \ --query 'Nameservers' --output json
# These should match your Route 53 hosted zone's NS records
# Get hosted zone NS: aws route53 list-resource-record-sets --hosted-zone-id YOUR_ZONE_ID \ --query "ResourceRecordSets[?Type=='NS']" --output json
# If mismatch: # Update domain registration nameservers to match hosted zone
# Update nameservers: aws route53domains update-domain-nameservers \ --domain-name example.com \ --nameservers NameServer1=ns-123.awsdns-12.com,NameServer2=ns-456.awsdns-45.net
# This change takes time (can be 24-48 hours) # Check TLD servers for propagation: dig @a.gtld-servers.net example.com NS +short ```
- 1.Debug Route 53 query logging.
```bash # Enable query logging to see DNS queries # Route 53 -> Hosted Zones -> Query logging
# Query logs go to CloudWatch Logs # Check logs: aws logs describe-log-groups --log-group-name-prefix /aws/route53/YOUR_ZONE
# Get recent queries: aws logs filter-log-events \ --log-group-name /aws/route53/YOUR_ZONE/example.com \ --start-time $(date -d '1 hour ago' +%s)000 \ --query 'events[*].message' --output text
# Look for: # - Query type and domain # - Resolver IP (who's asking) # - Edge location (which Route 53 server answered) # - Response type
# Common findings: # - Unexpected query patterns (security issue) # - Repeated NXDOMAIN (typo or attack) # - High query volume (need to optimize) ```
- 1.Monitor Route 53 with AWS tools.
```bash # Use CloudWatch metrics for Route 53: # - HealthCheckHealthyCount # - HealthCheckPercentageHealthy
aws cloudwatch get-metric-statistics \ --namespace AWS/Route53 \ --metric-name HealthCheckHealthyCount \ --dimensions Name=HealthCheckId,Value=YOUR_HC_ID \ --start-time $(date -d '1 day ago' +%s) \ --end-time $(date +%s) \ --period 300 \ --statistic Average
# Create health check monitoring: #!/bin/bash hc_id="YOUR_HEALTH_CHECK_ID" status=$(aws route53 get-health-check-status --health-check-id $hc_id \ --query 'HealthCheckObservations[0].Status' --output text)
if [ "$status" != "Healthy" ]; then echo "ALERT: Health check $hc_id is $status" # Send notification fi ```
Verification
Complete Route 53 verification:
```bash # 1. Check all Route 53 nameservers echo "=== Route 53 Nameservers ===" for ns in $(aws route53 get-hosted-zone --id YOUR_ZONE_ID \ --query 'DelegationSet.NameServers' --output text); do echo -n "$ns: " dig @$ns example.com A +short | head -1 done
# 2. Verify public resolver echo -e "\n=== Public Resolver ===" dig @8.8.8.8 example.com A +short
# 3. Check Alias target echo -e "\n=== Alias Target ===" aws route53 list-resource-record-sets --hosted-zone-id YOUR_ZONE_ID \ --query "ResourceRecordSets[?AliasTarget]" --output text
# 4. Health check status echo -e "\n=== Health Checks ===" for hc_id in $(aws route53 list-health-checks --query 'HealthChecks[*].Id' --output text); do echo -n "$hc_id: " aws route53 get-health-check-status --health-check-id $hc_id \ --query 'HealthCheckObservations[0].Status' --output text done
# 5. Check TTL echo -e "\n=== Record TTL ===" dig @ns-123.awsdns-12.com example.com A | grep "example.com" | head -1 | awk '{print $2}'
# 6. Verify routing policy results echo -e "\n=== Routing Test (20 queries) ===" for i in {1..20}; do dig @8.8.8.8 example.com A +short done | sort | uniq -c ```
Route 53 CLI Quick Reference
```bash # List hosted zones: aws route53 list-hosted-zones
# Get zone details: aws route53 get-hosted-zone --id ZONE_ID
# List records: aws route53 list-resource-record-sets --hosted-zone-id ZONE_ID
# Create record: aws route53 change-resource-record-sets --hosted-zone-id ZONE_ID \ --change-batch file://change.json
# change.json example: { "Changes": [{ "Action": "CREATE", "ResourceRecordSet": { "Name": "example.com.", "Type": "A", "TTL": 300, "ResourceRecords": [{"Value": "192.0.2.1"}] } }] }
# Alias record: { "Changes": [{ "Action": "CREATE", "ResourceRecordSet": { "Name": "example.com.", "Type": "A", "AliasTarget": { "HostedZoneId": "Z2158YUR7TJPHD", # ELB zone ID "DNSName": "dualstack.my-elb.elb.amazonaws.com.", "EvaluateTargetHealth": true } } }] }
# Common AWS resource HostedZoneId for Alias: # ELB/ALB in us-east-1: Z35SX7RQ3TL6XA # CloudFront: Z2FDTNDATAQYW2 # S3 website: per-region IDs (check AWS docs) ```
Route 53's unique features like Alias records and health check routing require special diagnostic approaches. Always test both through Route 53's authoritative servers and public resolvers, and account for routing policy behavior.