Introduction
SRV records are the invisible backbone of service discovery. When they are missing or incorrect, applications cannot locate domain controllers, SIP servers, LDAP services, or any service that relies on DNS-based discovery. The failure is often silent - applications simply fail to connect with vague timeout or "server not found" errors.
Symptoms
Clients cannot locate service endpoints even though the servers are running and accessible by IP address:
``` nslookup -type=SRV _ldap._tcp.example.com *** Unknown has no SRV records
dig _kerberos._tcp.example.com SRV ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 12345 ;; QUESTION SECTION: ;_kerberos._tcp.example.com. IN SRV
kinit user@EXAMPLE.COM kinit: Cannot resolve network address for KDC in realm EXAMPLE.COM
Outlook displays: "The connection to Microsoft Exchange is unavailable." ```
Active Directory domain joins fail:
The following error occurred when DNS was queried for the service location (SRV) resource record used to locate an Active Directory Domain Controller:
The error was: "DNS name does not exist."Common Causes
- 1.SRV record never created - Service installed but auto-registration failed or was disabled
- 2.SRV record deleted accidentally - Cleanup scripts, zone transfers, or manual edits
- 3.Incorrect priority or weight - Clients selecting wrong server first
- 4.Wrong port number - Service listening on non-standard port but SRV advertises standard port
- 5.Zone transfer issues - SRV records exist on primary but not secondary nameservers
- 6.Dynamic update disabled - Active Directory environments with secure dynamic updates blocked
Step-by-Step Fix
Step 1: Diagnose the Missing Record
Query for SRV records explicitly using the service and protocol prefix:
```bash # Query for LDAP service over TCP dig _ldap._tcp.example.com SRV +short
# Query for Kerberos dig _kerberos._tcp.example.com SRV +short dig _kerberos._udp.example.com SRV +short
# Query for SIP services dig _sip._tcp.example.com SRV +short dig _sips._tcp.example.com SRV +short
# Using nslookup (Windows) nslookup -type=SRV _ldap._tcp.example.com nslookup -type=SRV _kerberos._tcp.dc._msdcs.example.com ```
Check the authoritative nameserver directly:
```bash # Find the authoritative nameserver dig example.com NS +short
# Query the authoritative server directly dig @ns1.example.com _ldap._tcp.example.com SRV ```
Step 2: Verify Service Is Running
Confirm the backend service is actually running and listening:
```bash # Check if LDAP is running on the server netstat -tlnp | grep :389 netstat -tlnp | grep :636
# Check Kerberos ports netstat -tlnp | grep :88 netstat -ulnp | grep :88
# For SIP servers netstat -tlnp | grep :5060 netstat -tlnp | grep :5061 ```
Test the service directly by IP:
```bash # Test LDAP connectivity ldapsearch -x -H ldap://192.168.1.10 -b "dc=example,dc=com"
# Test Kerberos kinit -S kserver/admin@EXAMPLE.COM user@EXAMPLE.COM ```
Step 3: Create the Missing SRV Record
For BIND/named zone files:
```bash # Edit the zone file sudo vi /etc/bind/db.example.com
# Add SRV records with this format: # _service._proto.name. TTL class SRV priority weight port target.
_ldap._tcp.example.com. 3600 IN SRV 0 100 389 dc1.example.com. _ldap._tcp.example.com. 3600 IN SRV 10 50 389 dc2.example.com. _kerberos._tcp.example.com. 3600 IN SRV 0 100 88 dc1.example.com. _kerberos._udp.example.com. 3600 IN SRV 0 100 88 dc1.example.com.
# For SIP services _sip._tcp.example.com. 3600 IN SRV 10 60 5060 sip1.example.com. _sip._tcp.example.com. 3600 IN SRV 20 40 5060 sip2.example.com. ```
Check zone syntax and reload:
named-checkzone example.com /etc/bind/db.example.com
sudo rndc reload example.com
sudo systemctl reload namedFor Active Directory DNS:
powershell
# Create SRV record using PowerShell
Add-DnsServerResourceRecord -Srv -Name "_ldap._tcp" -ZoneName "example.com"
-Priority 0 -Weight 100 -Port 389 -DomainName "dc1.example.com"
# For domain controller locator records
Add-DnsServerResourceRecord -Srv -Name "_kerberos._tcp.dc._msdcs" -ZoneName "example.com"
-Priority 0 -Weight 100 -Port 88 -DomainName "dc1.example.com"
Using nsupdate for dynamic DNS:
nsupdate << EOF
server ns1.example.com
zone example.com
update add _ldap._tcp.example.com. 3600 IN SRV 0 100 389 dc1.example.com.
send
EOFStep 4: Handle Active Directory SRV Records
For Active Directory, ensure the Netlogon service registers SRV records:
```powershell # Check current registration nltest /dsregdns
# Force re-registration nltest /dsregdns /server:dc1.example.com
# Restart Netlogon service Restart-Service netlogon
# Verify registration Get-DnsServerResourceRecord -ZoneName example.com -RRType SRV ```
Check for secure dynamic update issues:
```powershell # Verify DNS zone allows secure updates Get-DnsServerPrimaryZone -Name example.com | Select-Object DynamicUpdates
# Should be "Secure" for AD-integrated zones Set-DnsServerPrimaryZone -Name example.com -DynamicUpdates Secure ```
Step 5: Configure Priority and Weight Correctly
SRV records use priority (lower is preferred) and weight (for load balancing among same priority):
``` # Priority 0 = primary servers, Priority 10 = backup servers # Weight distributes traffic among same-priority servers
_ldap._tcp.example.com. 3600 IN SRV 0 70 389 dc1.example.com. _ldap._tcp.example.com. 3600 IN SRV 0 30 389 dc2.example.com. _ldap._tcp.example.com. 3600 IN SRV 10 100 389 dc3.example.com.
# 70% of requests go to dc1, 30% to dc2 # dc3 only used if both dc1 and dc2 are unavailable ```
Step 6: Verify Propagation
Test from multiple resolvers:
```bash # Query authoritative server dig @ns1.example.com _ldap._tcp.example.com SRV +short
# Query Google DNS dig @8.8.8.8 _ldap._tcp.example.com SRV +short
# Query Cloudflare DNS dig @1.1.1.1 _ldap._tcp.example.com SRV +short
# Test actual service discovery ldapsearch -x -H ldap://example.com -b "" -s base "(objectclass=*)" currentDomain ```
Common Pitfalls
- Forgetting the trailing dot - In zone files, the target hostname must end with a dot if FQDN
- Wrong underscore placement - Must be
_service._proto, not_service_protoorservice._proto - Target record missing - The SRV target hostname must have an A or AAAA record
- Port mismatch - SRV port must match actual service port
- TTL too high - Long TTL delays recovery from bad records
- Firewall blocking - Service is running but firewall blocks the advertised port
Best Practices
- Lower SRV record TTL to 300-600 seconds for services that may need quick failover
- Monitor SRV record existence and validity with automated checks
- Test failover by removing a server and verifying clients switch to alternatives
- Document all SRV records with their purpose and expected behavior
- Keep priority and weight documented for quick reference during incidents
- Use health checks to dynamically remove SRV records for failed servers
Related Issues
- DNS A Record Missing
- Active Directory Domain Controller Not Found
- Kerberos Authentication Failure
- SIP Registration Failed
- DNS Zone Transfer Issues