Your Zabbix monitoring dashboard shows hosts with "ZBX_DISCONNECT" status, or you're getting alerts about unreachable agents. Without agent connectivity, you have no visibility into those systems. Let's systematically diagnose and fix agent connectivity issues.
Understanding the Error
Zabbix agent connectivity failures appear as:
- Host status shows "ZBX_DISCONNECT" or gray icon
- "Get value from agent failed" errors in Zabbix logs
- "Connection refused" or "Timeout" messages
- Active checks not being processed
Error patterns in Zabbix server logs:
item [hostname:agent.ping] became not supported: Value "ZBX_DISCONNECT" of type "string" is not suitable for value type "Numeric (unsigned)"cannot connect to [[hostname]:10050] [Connection refused]network error: timeout while connecting to [[hostname]:10050]Initial Diagnosis
Start with basic connectivity checks:
```bash # Check Zabbix agent status on the monitored host systemctl status zabbix-agent
# Test direct connection to agent zabbix_get -s hostname -k agent.ping # Should return "1" if agent is responding
# Check agent port is listening netstat -tlnp | grep 10050
# Check from Zabbix server nc -zv hostname 10050
# Check Zabbix server logs tail -n 100 /var/log/zabbix/zabbix_server.log | grep -i "cannot connect|timeout|failed"
# Check agent logs on monitored host tail -n 100 /var/log/zabbix/zabbix_agentd.log | grep -i "error|fail|refused" ```
Common Cause 1: Zabbix Agent Service Down
The agent process itself is not running.
Error pattern:
``
cannot connect to [[hostname]:10050] [Connection refused]
Diagnosis:
```bash # Check agent status on monitored host ssh hostname "systemctl status zabbix-agent"
# Check if agent process exists ssh hostname "ps aux | grep zabbix_agentd"
# Check agent log for startup errors ssh hostname "tail -50 /var/log/zabbix/zabbix_agentd.log"
# Check systemd journal for agent issues ssh hostname "journalctl -u zabbix-agent -n 50" ```
Solution:
```bash # Start the agent ssh hostname "systemctl start zabbix-agent"
# Check if it starts successfully ssh hostname "systemctl status zabbix-agent"
# If it fails to start, check configuration ssh hostname "zabbix_agentd --config /etc/zabbix/zabbix_agentd.conf --test"
# For persistent issues, check for config errors ssh hostname "grep -i error /var/log/zabbix/zabbix_agentd.log" ```
If agent fails to start, check configuration:
```bash # Verify configuration file syntax zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf --test-config
# Check for invalid parameters grep -E "^[^#]" /etc/zabbix/zabbix_agentd.conf | grep -v "^$" ```
Common Cause 2: Firewall Blocking Agent Port
Firewall rules prevent Zabbix server from reaching agent.
Error pattern:
``
network error: timeout while connecting to [[hostname]:10050]
Diagnosis:
```bash # Test port connectivity from Zabbix server nc -zv hostname 10050 # If this times out, firewall is likely blocking
# Check firewall on monitored host ssh hostname "iptables -L -n | grep 10050" ssh hostname "firewall-cmd --list-all"
# Check if port is accessible locally ssh hostname "nc -zv localhost 10050"
# Test with telnet telnet hostname 10050 ```
Solution:
Open firewall for Zabbix agent:
```bash # For iptables iptables -A INPUT -p tcp --dport 10050 -s zabbix-server-ip -j ACCEPT iptables-save > /etc/iptables/rules.v4
# For firewalld firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="zabbix-server-ip" port protocol="tcp" port="10050" accept' firewall-cmd --reload
# For ufw ufw allow from zabbix-server-ip to any port 10050 proto tcp
# Verify rule is applied firewall-cmd --list-all iptables -L -n | grep 10050 ```
Common Cause 3: Agent Configuration - Wrong Server IP
Agent is configured to accept connections from wrong server IP.
Error pattern:
``
cannot connect to [[hostname]:10050] [Connection refused]
This happens when agent rejects connections from unauthorized sources.
Diagnosis:
```bash # Check agent server configuration ssh hostname "grep Server= /etc/zabbix/zabbix_agentd.conf"
# Check allowed server IPs ssh hostname "cat /etc/zabbix/zabbix_agentd.conf | grep -E '^Server|^ServerActive'"
# Check ListenIP configuration ssh hostname "grep ListenIP /etc/zabbix/zabbix_agentd.conf" ```
Solution:
Update agent configuration with correct server IP:
```bash # Edit agent configuration ssh hostname "vi /etc/zabbix/zabbix_agentd.conf"
# Update Server directive to include Zabbix server IP Server=127.0.0.1,zabbix-server-ip
# For active checks, also update ServerActive ServerActive=zabbix-server-ip:10051
# Ensure ListenIP is correct (default 0.0.0.0 for all interfaces) ListenIP=0.0.0.0
# Restart agent ssh hostname "systemctl restart zabbix-agent"
# Test connectivity zabbix_get -s hostname -k agent.ping ```
Common Cause 4: DNS or Network Issues
Hostname resolution fails or network path is broken.
Error pattern:
``
network error: getaddrinfo() failed for "hostname"
Diagnosis:
```bash # Test DNS resolution from Zabbix server nslookup hostname dig hostname
# Check if IP is correct ping hostname
# Try connecting with IP directly zabbix_get -s hostname-ip -k agent.ping
# Check agent's network interface ssh hostname "ip addr show" ssh hostname "ip route show" ```
Solution:
Fix DNS or use IP address:
```bash # Option 1: Fix DNS resolution # Add proper DNS entry or update /etc/hosts
# Option 2: Use IP in Zabbix host configuration # Update host in Zabbix UI: Configuration > Hosts > [host] > DNS name or IP address
# Option 3: Add to hosts file on Zabbix server echo "hostname-ip hostname" >> /etc/hosts
# Test connectivity after fix zabbix_get -s hostname -k agent.ping ```
Common Cause 5: Active Check Configuration Issues
Active agents cannot reach Zabbix server for task list.
Error pattern:
``
no active checks on server [zabbix-server:10051]: host is not monitored
Diagnosis:
```bash # Check agent active check configuration ssh hostname "grep ServerActive /etc/zabbix/zabbix_agentd.conf"
# Check if server active port is accessible ssh hostname "nc -zv zabbix-server 10051"
# Check agent logs for active check errors ssh hostname "tail -50 /var/log/zabbix/zabbix_agentd.log | grep 'active check'" ```
Solution:
Configure active checks properly:
```bash # Update ServerActive in agent config ServerActive=zabbix-server-ip:10051
# Ensure hostname matches Zabbix configuration Hostname=exact-zabbix-hostname
# Check hostname in Zabbix matches # Configuration > Hosts > [host] > Host name must match exactly
# Restart agent ssh hostname "systemctl restart zabbix-agent" ```
Common Cause 6: Agent Timeout Settings
Timeout values are too low for slow networks or complex checks.
Error pattern:
``
Timeout while waiting for response from [[hostname]:10050]
Diagnosis:
```bash # Check agent timeout setting ssh hostname "grep Timeout /etc/zabbix/zabbix_agentd.conf"
# Check server timeout setting grep Timeout /etc/zabbix/zabbix_server.conf
# Test with verbose output zabbix_get -s hostname -k system.run["sleep 5"] --timeout 10 ```
Solution:
Increase timeout values:
```bash # On agent side # /etc/zabbix/zabbix_agentd.conf Timeout=10
# On server side # /etc/zabbix/zabbix_server.conf Timeout=10
# Restart both services systemctl restart zabbix-server ssh hostname "systemctl restart zabbix-agent" ```
Common Cause 7: TLS/Encryption Configuration
TLS settings mismatch between server and agent.
Error pattern:
``
cannot connect to [[hostname]:10050] [TLS connection error]
Diagnosis:
```bash # Check TLS configuration on agent ssh hostname "grep -E 'TLS|PSK|Certificate' /etc/zabbix/zabbix_agentd.conf"
# Check TLS configuration on server host config # In Zabbix UI: Configuration > Hosts > [host] > Encryption
# Test connection with TLS options zabbix_get -s hostname -k agent.ping --tls-connect psk --tls-psk-identity "identity" --tls-psk "key" ```
Solution:
Configure TLS properly:
```bash # Generate PSK for agent openssl rand -hex 32 > /etc/zabbix/zabbix_agentd.psk
# Configure agent TLS # /etc/zabbix/zabbix_agentd.conf TLSConnect=psk TLSAccept=psk TLSPSKIdentity=hostname-psk TLSPSKFile=/etc/zabbix/zabbix_agentd.psk
# In Zabbix UI, update host encryption settings: # Configuration > Hosts > [host] > Encryption # Connections to agent: PSK # PSK identity: hostname-psk # PSK: (paste the key from /etc/zabbix/zabbix_agentd.psk)
# Restart agent systemctl restart zabbix-agent ```
Common Cause 8: Resource Limits
Agent process is killed by resource constraints.
Error pattern:
``
agent process is not running (killed by OOM or signal)
Diagnosis:
```bash # Check system memory ssh hostname "free -h"
# Check OOM events ssh hostname "dmesg | grep -i oom | tail -20"
# Check systemd resource limits ssh hostname "systemctl show zabbix-agent | grep -i limit"
# Check cgroup limits if applicable ssh hostname "cat /sys/fs/cgroup/memory/system.slice/zabbix-agent.service/memory.limit_in_bytes" ```
Solution:
Adjust resource limits:
```bash # Edit systemd override systemctl edit zabbix-agent
[Service] MemoryLimit=256M LimitNOFILE=8192
# Or in agent configuration # /etc/zabbix/zabbix_agentd.conf # Reduce buffer sizes if memory constrained BufferSize=100 MaxLinesPerSecond=100
# Restart agent systemctl restart zabbix-agent ```
Verification
After fixing, verify agent connectivity:
```bash # Test direct agent query zabbix_get -s hostname -k agent.ping # Should return "1"
# Test multiple items zabbix_get -s hostname -k system.hostname zabbix_get -s hostname -k system.cpu.load[all,avg1] zabbix_get -s hostname -k vfs.fs.size[/,pused]
# Check Zabbix host status in UI # Configuration > Hosts - should show green ZBX icon
# Check server logs for successful connections tail -f /var/log/zabbix/zabbix_server.log | grep hostname
# Monitor agent performance curl -s http://hostname:10050/status | jq '.' ```
Prevention
Set up Zabbix internal monitoring:
```bash # Create internal checks in Zabbix # Add items to Zabbix server template:
# Agent ping check agent.ping
# Agent availability percentage zabbix[host,agent,available]
# Server availability zabbix[host,zabbix_server,available]
# Create triggers # Agent unreachable for 5 minutes {host:agent.ping.max(5m)}=0 ```
Create notification rules:
# In Zabbix UI: Configuration > Actions
# Create action for agent unreachable:
# Conditions: Trigger = "Zabbix agent is unreachable"
# Operations: Send notification, attempt restart via SSHAgent connectivity issues usually stem from service status, firewall rules, or configuration mismatches. Start with direct connection tests, then check firewall and configuration on both ends.