Introduction
F5 BIG-IP load balancers monitor pool members through health monitors to determine availability. When pool members are marked down, traffic stops flowing to those nodes. Pool member failures can result from health monitor issues, network connectivity problems, application failures, or configuration errors. BIG-IP's extensive configuration options require systematic troubleshooting.
Symptoms
Error indicators in F5 logs and TMUI:
Pool member 10.0.0.1:80 is down
Monitor /Common/http_monitor: node down
No enabled pool members available
Pool /Common/web_pool has no available membersObservable indicators:
- Pool members show "Down" status in TMUI
- Pool status shows "Offline" or "Red"
- HTTP 502/503 responses from virtual server
- ltmPoolMemberStatus SNMP OID shows failure
- Connection queue building on virtual server
Common Causes
- 1.Health monitor mismatch - Wrong path, port, or expected response
- 2.Monitor timeout too short - Application slower than timeout
- 3.Pool member disabled - Member administratively disabled
- 4.Network connectivity failure - Firewall or routing issues
- 5.Application crash - Backend service not responding
- 6.Connection limit reached - Member connection count exceeded
- 7.Priority group issues - Lower priority members not receiving traffic
Step-by-Step Fix
Step 1: Check Pool Member Status
```bash # Via TMUI (web interface) # Navigate to: Local Traffic > Pools > Pool List > pool_name > Members
# Via CLI (tmsh) tmsh list ltm pool web_pool members
# Show member status tmsh show ltm pool web_pool members field-fmt
# Get detailed status tmsh show ltm pool web_pool members 10.0.0.1:80 detail
# Check all pool statuses tmsh show ltm pool ```
Step 2: Check Health Monitor Status
```bash # List monitors attached to pool tmsh list ltm pool web_pool monitor
# Show monitor details tmsh list ltm monitor http http_monitor
# Check monitor status tmsh show ltm monitor http http_monitor
# View monitor statistics tmsh show ltm monitor http http_monitor stats ```
Step 3: Test Backend Connectivity
```bash # From BIG-IP, test connection to member curl -v http://10.0.0.1:80/health
# Test TCP connectivity tmsh run util curl-tests -h 10.0.0.1 -p 80 -u /health
# Network connectivity test ping 10.0.0.1
# Check routing tmsh show net route ```
Step 4: Check Monitor Configuration
```bash # Show current monitor config tmsh list ltm monitor http my_http_monitor all-properties
# Check monitor interval and timeout tmsh list ltm monitor http my_http_monitor interval timeout
# Verify receive/recv string tmsh list ltm monitor http my_http_monitor recv send ```
```bash # Update HTTP monitor settings tmsh modify ltm monitor http my_http_monitor \ destination *:80 \ interval 5 \ timeout 16 \ recv "HTTP/1.1 200" \ send "GET /health HTTP/1.1\r\nHost: example.com\r\nConnection: close\r\n\r\n"
# Save configuration tmsh save sys config ```
Step 5: Enable Pool Member
```bash # Check if member is disabled tmsh list ltm pool web_pool members 10.0.0.1:80 session
# Enable member tmsh modify ltm pool web_pool members 10.0.0.1:80 session user-enabled
# Force member online (override monitor) tmsh modify ltm pool web_pool members 10.0.0.1:80 state user-up
# Check member status after change tmsh show ltm pool web_pool members 10.0.0.1:80 ```
Step 6: Check Connection Limits
```bash # Check connection limit on member tmsh list ltm pool web_pool members 10.0.0.1:80 connection-limit
# Check current connections tmsh show ltm pool web_pool members 10.0.0.1:80 stats
# Modify connection limit tmsh modify ltm pool web_pool members 10.0.0.1:80 connection-limit 1000 ```
Step 7: Check Priority Groups
```bash # List priority group settings tmsh list ltm pool web_pool members all-properties priority-group
# Show member priorities tmsh show ltm pool web_pool members field-fmt priority-group
# Adjust priority groups tmsh modify ltm pool web_pool members 10.0.0.1:80 priority-group 10 tmsh modify ltm pool web_pool members 10.0.0.2:80 priority-group 5 ```
Step 8: Verify the Fix
```bash # Check pool status tmsh show ltm pool web_pool
# Monitor member status tmsh show ltm pool web_pool members field-fmt
# Test through virtual server curl -v http://virtual-server-ip/
# Check connection stats tmsh show ltm virtual web_vs stats ```
Advanced Diagnosis
Debug Monitor Failures
```bash # Run monitor manually tmsh run ltm monitor http my_http_monitor pool-member 10.0.0.1:80
# Check monitor logs tmsh show sys log ltm monitor
# View recent monitor events tmsh show sys log ltm monitor | grep -i "down|up" ```
Check SNAT Configuration
```bash # If using SNAT, verify it's working tmsh list ltm snat
# Check SNAT pool tmsh show ltm snatpool my_snatpool
# Verify SNAT translation tmsh show net self-ip ```
Monitor Statistics
```bash # Detailed member statistics tmsh show ltm pool web_pool members 10.0.0.1:80 raw-statistics
# Check error statistics tmsh show ltm pool web_pool members 10.0.0.1:80 stats | grep -i error
# Monitor queue depth tmsh show ltm pool web_pool raw-statistics | grep -i queue ```
Use External Monitor
```bash # Create external monitor (script-based) tmsh create ltm monitor external my_external_monitor \ run "/var/tmp/health_check.sh" \ interval 10 \ timeout 31
# External monitor script cat > /var/tmp/health_check.sh << 'EOF' #!/bin/bash NODE=$1 PORT=$2 curl -sf http://${NODE}:${PORT}/health && exit 0 || exit 1 EOF chmod +x /var/tmp/health_check.sh ```
Common Pitfalls
- Monitor recv string mismatch - Expecting wrong HTTP response
- Monitor interval vs timeout - Timeout must be > 3x interval
- Disabled session state - Member manually disabled
- SNAT not configured - Backend sees wrong source IP
- Port mismatch - Monitor checking wrong port
- Connection limit reached - Member can't accept more connections
- Priority group all members same - All same priority, no failover
Best Practices
```bash # Create robust HTTP monitor tmsh create ltm monitor http robust_http_monitor \ defaults-from http \ interval 5 \ timeout 16 \ recv "HTTP/1.1 200" \ recv-disable "HTTP/1.1 503" \ send "GET /health HTTP/1.1\r\nHost: example.com\r\nConnection: close\r\n\r\n" \ destination *:80
# Configure pool with proper settings tmsh create ltm pool web_pool \ load-balancing-mode round-robin \ monitor robust_http_monitor \ members add { 10.0.0.1:80 { connection-limit 1000 priority-group 10 } \ 10.0.0.2:80 { connection-limit 1000 priority-group 10 } \ 10.0.0.3:80 { connection-limit 500 priority-group 5 } }
# Enable slow start for new members tmsh modify ltm pool web_pool slow-start-time 300 ```
iRule for Health Check Debugging
```tcl # iRule to log monitor activity when SERVER_CONNECTED { log local0. "Server connected: [IP::server_addr]:[TCP::server_port]" }
when HTTP_RESPONSE { log local0. "Response status: [HTTP::status] from [IP::server_addr]" } ```
Related Issues
- F5 BIG-IP SSL Profile Error
- HAProxy Backend Down
- AWS ALB Health Check Failing
- GCP Load Balancer Backend Down