What's Actually Happening

Consul service discovery fails to locate registered services. Clients cannot resolve service addresses via Consul DNS or API queries.

The Error You'll See

```bash $ dig @127.0.0.1 -p 8600 myservice.service.consul

;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 1234 ;; Query time: 0 msec ;; SERVER: 127.0.0.1#8600 ;; NXDOMAIN - Domain does not exist ```

Consul catalog empty:

```bash $ curl http://localhost:8500/v1/catalog/services

{} ```

Agent error:

```bash $ consul members

Node Address Status Type Build Protocol DC Segment consul-1 10.0.0.1 failed server 1.15.0 2 dc1 <all> ```

Why This Happens

  1. 1.Consul agent not running - Server or client agent stopped
  2. 2.Service not registered - Service definition missing
  3. 3.DNS port not configured - Consul DNS not listening
  4. 4.Network connectivity - Agents cannot communicate
  5. 5.Datacenter mismatch - Wrong datacenter queried
  6. 6.ACL blocking - Access control denying queries

Step 1: Check Consul Agent Status

```bash # Check Consul service: systemctl status consul

# Or for running process: ps aux | grep consul

# Check Consul members: consul members

# Output should show all nodes as alive: # Node Address Status Type # consul-1 10.0.0.1 alive server # consul-2 10.0.0.2 alive server

# Check agent health: curl http://localhost:8500/v1/agent/self

# Check leader: curl http://localhost:8500/v1/status/leader

# Check peers: curl http://localhost:8500/v1/status/peers

# Check node status: consul operator raft list-peers ```

Step 2: Check Service Registration

```bash # List all services: curl http://localhost:8500/v1/catalog/services

# Check specific service: curl http://localhost:8500/v1/catalog/service/myservice

# List service nodes: consul catalog services

# Check service instances: consul catalog nodes -service myservice

# Check local services: curl http://localhost:8500/v1/agent/services

# Register service manually: curl -X PUT http://localhost:8500/v1/agent/service/register -d '{"Name":"myservice","Port":8080}'

# Or via config file: cat << 'EOF' > /etc/consul.d/myservice.json { "service": { "name": "myservice", "port": 8080, "tags": ["web", "api"], "check": { "id": "myservice-health", "name": "HTTP Health Check", "http": "http://localhost:8080/health", "interval": "10s" } } } EOF

# Restart Consul: systemctl restart consul ```

Step 3: Check Consul DNS

```bash # Check DNS port: ss -tlnp | grep 8600

# Test DNS query: dig @127.0.0.1 -p 8600 myservice.service.consul

# Should return IP addresses

# Test with consul CLI: consul catalog nodes -service myservice -detailed

# Check DNS configuration: cat /etc/consul/consul.json | grep dns

# DNS config: { "dns_config": { "enable_truncate": true, "max_stale": "5s", "only_passing": true } }

# Test different DNS formats: dig @127.0.0.1 -p 8600 myservice.service.consul SRV dig @127.0.0.1 -p 8600 myservice.tcp.service.consul

# Check DNS recursor: dig @127.0.0.1 -p 8600 google.com # Should recurse to external DNS ```

Step 4: Check Network Connectivity

```bash # Check agent ports: # 8300 - Server RPC # 8301 - LAN Serf # 8302 - WAN Serf # 8500 - HTTP API # 8600 - DNS

ss -tlnp | grep consul

# Test connectivity between agents: nc -zv consul-1 8300 nc -zv consul-1 8301

# Check firewall: iptables -L -n | grep -E "8300|8301|8500|8600"

# Allow Consul ports: iptables -I INPUT -p tcp --dport 8300 -j ACCEPT iptables -I INPUT -p tcp --dport 8500 -j ACCEPT iptables -I INPUT -p udp --dport 8600 -j ACCEPT

# Check Serf gossip: consul members -detailed

# Check WAN connectivity: consul members -wan

# Ping agents: ping consul-1 ping consul-2 ```

Step 5: Check Datacenter Configuration

```bash # Check datacenter: curl http://localhost:8500/v1/agent/self | jq .Config.Datacenter

# Or via consul CLI: consul info | grep datacenter

# List all datacenters: curl http://localhost:8500/v1/catalog/datacenters

# Query specific datacenter: curl http://localhost:8500/v1/catalog/service/myservice?dc=dc1

# DNS query for specific DC: dig @127.0.0.1 -p 8600 myservice.service.dc1.consul

# Cross-datacenter query: dig @127.0.0.1 -p 8600 myservice.service.dc2.consul

# Check agent config for DC: cat /etc/consul/consul.json | grep datacenter

# Set datacenter: { "datacenter": "dc1", "data_dir": "/var/lib/consul" } ```

Step 6: Check ACL Configuration

```bash # Check if ACLs enabled: curl http://localhost:8500/v1/agent/self | jq .Config.ACLsEnabled

# List ACL policies: consul acl policy list

# List ACL roles: consul acl role list

# Check token: consul acl token read -self

# If ACLs enabled, need token for queries: curl -H "X-Consul-Token: my-token" http://localhost:8500/v1/catalog/services

# Create token for service read: consul acl token create -policy-name=service-read

# Check ACL policy for DNS: # Policy must allow service discovery: # service:read # node:read

# Default token in config: { "acl": { "tokens": { "default": "anonymous-token", "agent": "agent-token" } } } ```

Step 7: Check Health Checks

```bash # Check health status: curl http://localhost:8500/v1/health/service/myservice

# Check passing services only: curl http://localhost:8500/v1/health/service/myservice?passing

# Check node health: curl http://localhost:8500/v1/health/node/consul-1

# Check checks: curl http://localhost:8500/v1/agent/checks

# List failing services: curl http://localhost:8500/v1/health/state/critical

# Check specific check: curl http://localhost:8500/v1/health/check/myservice

# DNS only returns passing by default: { "dns_config": { "only_passing": true } }

# If all critical, DNS returns NXDOMAIN

# Check service check status: consul kv get -recurse health/ ```

Step 8: Check Consul Logs

```bash # Check systemd logs: journalctl -u consul -f

# Check Consul log file: tail -f /var/log/consul.log

# Check for errors: grep -i error /var/log/consul.log grep -i warning /var/log/consul.log

# Check for service registration: grep -i "register" /var/log/consul.log

# Check for DNS errors: grep -i dns /var/log/consul.log

# Enable debug logging: # In consul.json: { "log_level": "DEBUG" }

# Restart: systemctl restart consul

# Debug output: consul monitor -log-level=debug ```

Step 9: Restart and Reload

```bash # Reload configuration: consul reload

# Or via API: curl -X PUT http://localhost:8500/v1/agent/reload

# Restart service: systemctl restart consul

# Force leader stepdown: consul operator raft remove-peer -peer-id=leader-id

# Check Raft status: consul operator raft list-peers

# Rejoin cluster: consul join consul-1 consul-2

# Leave cluster gracefully: consul leave

# Re-register services: curl -X PUT http://localhost:8500/v1/agent/service/register -d '{"Name":"myservice","Port":8080}'

# Deregister and re-register: curl -X PUT http://localhost:8500/v1/agent/service/deregister/myservice-id curl -X PUT http://localhost:8500/v1/agent/service/register -d '{"ID":"myservice-id","Name":"myservice","Port":8080}' ```

Step 10: Consul Verification Script

```bash # Create verification script: cat << 'EOF' > /usr/local/bin/check-consul.sh #!/bin/bash

echo "=== Consul Members ===" consul members

echo "" echo "=== Consul Leader ===" curl -s http://localhost:8500/v1/status/leader

echo "" echo "=== Catalog Services ===" curl -s http://localhost:8500/v1/catalog/services | jq .

echo "" echo "=== Service Instances ===" consul catalog services

echo "" echo "=== DNS Test ===" dig @127.0.0.1 -p 8600 myservice.service.consul +short

echo "" echo "=== Health Status ===" curl -s http://localhost:8500/v1/health/state/any | jq 'group_by(.Status) | map({Status: .[0].Status, Count: length})'

echo "" echo "=== Agent Self ===" curl -s http://localhost:8500/v1/agent/self | jq '{Datacenter, NodeName, ACLsEnabled}'

echo "" echo "=== Checks ===" curl -s http://localhost:8500/v1/agent/checks | jq . EOF

chmod +x /usr/local/bin/check-consul.sh

# Run: /usr/local/bin/check-consul.sh

# Monitor service: watch -n 5 'consul catalog services' ```

Consul Service Discovery Checklist

CheckCommandExpected
Agent statusconsul membersAll alive
Service registeredcurl catalog/serviceService listed
DNS listeningss -tlnpPort 8600
DNS querydig service.consulReturns IPs
Health passinghealth/service?passingServices healthy
Networknc consul portsPorts open

Verify the Fix

```bash # After fixing service discovery

# 1. Check members consul members // All nodes alive

# 2. Query service curl http://localhost:8500/v1/catalog/service/myservice // Service instances listed

# 3. Test DNS dig @127.0.0.1 -p 8600 myservice.service.consul // Returns IP addresses

# 4. Check health curl http://localhost:8500/v1/health/service/myservice?passing // Instances passing

# 5. Test from client curl http://myservice.service.consul:8080 // Connection successful

# 6. Monitor stability consul monitor -log-level=info // No errors ```

  • [Fix Consul Service Not Discovered](/articles/fix-consul-service-not-discovered)
  • [Fix DNS Resolution Failed](/articles/fix-dns-resolution-failed)
  • [Fix Consul Agent Join Failed](/articles/fix-consul-agent-join-failed)