Introduction Cloud DNS resolution failures cause service discovery issues, broken internal communication, and failed deployments. When DNS records in managed zones cannot be resolved, applications cannot find services by name.

Symptoms - `nslookup my-service.internal` returns NXDOMAIN or SERVFAIL - Applications fail with "could not resolve host" errors - DNS queries timeout after several seconds - Only specific VPCs affected while others resolve correctly

Common Causes - Private zone not bound to the querying VPC network - DNS peering zone not configured correctly - Inbound/outbound forwarding rules misconfigured - Record set propagation delay after creation/update - Conflicting DNS policies

Step-by-Step Fix 1. **Check if private zone is bound to the VPC**: ```bash gcloud dns managed-zones describe <zone-name> --format="value(dnsName,visibility)" ``` If the querying VPC is not in the networks list, add it: ```bash gcloud dns managed-zones update <zone-name> --networks default,vpc-a ```

  1. 1.Check VPC DNS policy:
  2. 2.```bash
  3. 3.gcloud dns policies list --format="value(name,enableInboundForwarding)"
  4. 4.`
  5. 5.Verify record sets exist:
  6. 6.```bash
  7. 7.gcloud dns record-sets list --zone <zone-name> --format="table(name,type,ttl,rrdatas)"
  8. 8.`
  9. 9.Test DNS resolution from a VM in the VPC:
  10. 10.```bash
  11. 11.gcloud compute ssh <vm-name> --zone <zone> --command "nslookup my-service.internal"
  12. 12.`

Prevention - Verify private zone bindings after VPC changes - Use Cloud DNS logging for query analysis - Set appropriate TTL values (300s for internal services) - Test DNS resolution from each VPC after zone changes - Monitor DNS query error rates with Cloud Monitoring