What's Actually Happening
Consul service discovery fails to locate registered services. Clients cannot resolve service addresses via Consul DNS or API queries.
The Error You'll See
```bash $ dig @127.0.0.1 -p 8600 myservice.service.consul
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 1234 ;; Query time: 0 msec ;; SERVER: 127.0.0.1#8600 ;; NXDOMAIN - Domain does not exist ```
Consul catalog empty:
```bash $ curl http://localhost:8500/v1/catalog/services
{} ```
Agent error:
```bash $ consul members
Node Address Status Type Build Protocol DC Segment consul-1 10.0.0.1 failed server 1.15.0 2 dc1 <all> ```
Why This Happens
- 1.Consul agent not running - Server or client agent stopped
- 2.Service not registered - Service definition missing
- 3.DNS port not configured - Consul DNS not listening
- 4.Network connectivity - Agents cannot communicate
- 5.Datacenter mismatch - Wrong datacenter queried
- 6.ACL blocking - Access control denying queries
Step 1: Check Consul Agent Status
```bash # Check Consul service: systemctl status consul
# Or for running process: ps aux | grep consul
# Check Consul members: consul members
# Output should show all nodes as alive: # Node Address Status Type # consul-1 10.0.0.1 alive server # consul-2 10.0.0.2 alive server
# Check agent health: curl http://localhost:8500/v1/agent/self
# Check leader: curl http://localhost:8500/v1/status/leader
# Check peers: curl http://localhost:8500/v1/status/peers
# Check node status: consul operator raft list-peers ```
Step 2: Check Service Registration
```bash # List all services: curl http://localhost:8500/v1/catalog/services
# Check specific service: curl http://localhost:8500/v1/catalog/service/myservice
# List service nodes: consul catalog services
# Check service instances: consul catalog nodes -service myservice
# Check local services: curl http://localhost:8500/v1/agent/services
# Register service manually: curl -X PUT http://localhost:8500/v1/agent/service/register -d '{"Name":"myservice","Port":8080}'
# Or via config file: cat << 'EOF' > /etc/consul.d/myservice.json { "service": { "name": "myservice", "port": 8080, "tags": ["web", "api"], "check": { "id": "myservice-health", "name": "HTTP Health Check", "http": "http://localhost:8080/health", "interval": "10s" } } } EOF
# Restart Consul: systemctl restart consul ```
Step 3: Check Consul DNS
```bash # Check DNS port: ss -tlnp | grep 8600
# Test DNS query: dig @127.0.0.1 -p 8600 myservice.service.consul
# Should return IP addresses
# Test with consul CLI: consul catalog nodes -service myservice -detailed
# Check DNS configuration: cat /etc/consul/consul.json | grep dns
# DNS config: { "dns_config": { "enable_truncate": true, "max_stale": "5s", "only_passing": true } }
# Test different DNS formats: dig @127.0.0.1 -p 8600 myservice.service.consul SRV dig @127.0.0.1 -p 8600 myservice.tcp.service.consul
# Check DNS recursor: dig @127.0.0.1 -p 8600 google.com # Should recurse to external DNS ```
Step 4: Check Network Connectivity
```bash # Check agent ports: # 8300 - Server RPC # 8301 - LAN Serf # 8302 - WAN Serf # 8500 - HTTP API # 8600 - DNS
ss -tlnp | grep consul
# Test connectivity between agents: nc -zv consul-1 8300 nc -zv consul-1 8301
# Check firewall: iptables -L -n | grep -E "8300|8301|8500|8600"
# Allow Consul ports: iptables -I INPUT -p tcp --dport 8300 -j ACCEPT iptables -I INPUT -p tcp --dport 8500 -j ACCEPT iptables -I INPUT -p udp --dport 8600 -j ACCEPT
# Check Serf gossip: consul members -detailed
# Check WAN connectivity: consul members -wan
# Ping agents: ping consul-1 ping consul-2 ```
Step 5: Check Datacenter Configuration
```bash # Check datacenter: curl http://localhost:8500/v1/agent/self | jq .Config.Datacenter
# Or via consul CLI: consul info | grep datacenter
# List all datacenters: curl http://localhost:8500/v1/catalog/datacenters
# Query specific datacenter: curl http://localhost:8500/v1/catalog/service/myservice?dc=dc1
# DNS query for specific DC: dig @127.0.0.1 -p 8600 myservice.service.dc1.consul
# Cross-datacenter query: dig @127.0.0.1 -p 8600 myservice.service.dc2.consul
# Check agent config for DC: cat /etc/consul/consul.json | grep datacenter
# Set datacenter: { "datacenter": "dc1", "data_dir": "/var/lib/consul" } ```
Step 6: Check ACL Configuration
```bash # Check if ACLs enabled: curl http://localhost:8500/v1/agent/self | jq .Config.ACLsEnabled
# List ACL policies: consul acl policy list
# List ACL roles: consul acl role list
# Check token: consul acl token read -self
# If ACLs enabled, need token for queries: curl -H "X-Consul-Token: my-token" http://localhost:8500/v1/catalog/services
# Create token for service read: consul acl token create -policy-name=service-read
# Check ACL policy for DNS: # Policy must allow service discovery: # service:read # node:read
# Default token in config: { "acl": { "tokens": { "default": "anonymous-token", "agent": "agent-token" } } } ```
Step 7: Check Health Checks
```bash # Check health status: curl http://localhost:8500/v1/health/service/myservice
# Check passing services only: curl http://localhost:8500/v1/health/service/myservice?passing
# Check node health: curl http://localhost:8500/v1/health/node/consul-1
# Check checks: curl http://localhost:8500/v1/agent/checks
# List failing services: curl http://localhost:8500/v1/health/state/critical
# Check specific check: curl http://localhost:8500/v1/health/check/myservice
# DNS only returns passing by default: { "dns_config": { "only_passing": true } }
# If all critical, DNS returns NXDOMAIN
# Check service check status: consul kv get -recurse health/ ```
Step 8: Check Consul Logs
```bash # Check systemd logs: journalctl -u consul -f
# Check Consul log file: tail -f /var/log/consul.log
# Check for errors: grep -i error /var/log/consul.log grep -i warning /var/log/consul.log
# Check for service registration: grep -i "register" /var/log/consul.log
# Check for DNS errors: grep -i dns /var/log/consul.log
# Enable debug logging: # In consul.json: { "log_level": "DEBUG" }
# Restart: systemctl restart consul
# Debug output: consul monitor -log-level=debug ```
Step 9: Restart and Reload
```bash # Reload configuration: consul reload
# Or via API: curl -X PUT http://localhost:8500/v1/agent/reload
# Restart service: systemctl restart consul
# Force leader stepdown: consul operator raft remove-peer -peer-id=leader-id
# Check Raft status: consul operator raft list-peers
# Rejoin cluster: consul join consul-1 consul-2
# Leave cluster gracefully: consul leave
# Re-register services: curl -X PUT http://localhost:8500/v1/agent/service/register -d '{"Name":"myservice","Port":8080}'
# Deregister and re-register: curl -X PUT http://localhost:8500/v1/agent/service/deregister/myservice-id curl -X PUT http://localhost:8500/v1/agent/service/register -d '{"ID":"myservice-id","Name":"myservice","Port":8080}' ```
Step 10: Consul Verification Script
```bash # Create verification script: cat << 'EOF' > /usr/local/bin/check-consul.sh #!/bin/bash
echo "=== Consul Members ===" consul members
echo "" echo "=== Consul Leader ===" curl -s http://localhost:8500/v1/status/leader
echo "" echo "=== Catalog Services ===" curl -s http://localhost:8500/v1/catalog/services | jq .
echo "" echo "=== Service Instances ===" consul catalog services
echo "" echo "=== DNS Test ===" dig @127.0.0.1 -p 8600 myservice.service.consul +short
echo "" echo "=== Health Status ===" curl -s http://localhost:8500/v1/health/state/any | jq 'group_by(.Status) | map({Status: .[0].Status, Count: length})'
echo "" echo "=== Agent Self ===" curl -s http://localhost:8500/v1/agent/self | jq '{Datacenter, NodeName, ACLsEnabled}'
echo "" echo "=== Checks ===" curl -s http://localhost:8500/v1/agent/checks | jq . EOF
chmod +x /usr/local/bin/check-consul.sh
# Run: /usr/local/bin/check-consul.sh
# Monitor service: watch -n 5 'consul catalog services' ```
Consul Service Discovery Checklist
| Check | Command | Expected |
|---|---|---|
| Agent status | consul members | All alive |
| Service registered | curl catalog/service | Service listed |
| DNS listening | ss -tlnp | Port 8600 |
| DNS query | dig service.consul | Returns IPs |
| Health passing | health/service?passing | Services healthy |
| Network | nc consul ports | Ports open |
Verify the Fix
```bash # After fixing service discovery
# 1. Check members consul members // All nodes alive
# 2. Query service curl http://localhost:8500/v1/catalog/service/myservice // Service instances listed
# 3. Test DNS dig @127.0.0.1 -p 8600 myservice.service.consul // Returns IP addresses
# 4. Check health curl http://localhost:8500/v1/health/service/myservice?passing // Instances passing
# 5. Test from client curl http://myservice.service.consul:8080 // Connection successful
# 6. Monitor stability consul monitor -log-level=info // No errors ```
Related Issues
- [Fix Consul Service Not Discovered](/articles/fix-consul-service-not-discovered)
- [Fix DNS Resolution Failed](/articles/fix-dns-resolution-failed)
- [Fix Consul Agent Join Failed](/articles/fix-consul-agent-join-failed)