Introduction

Cloudflare Load Balancing distributes traffic across multiple origin servers or pools based on health checks, geography, and configured policies. When Load Balancer issues occur, traffic may route to unhealthy servers, failover doesn't trigger, or all requests go to a single origin. Troubleshooting requires checking pool health status, health check configuration, origin server accessibility, and load balancing rules. Load Balancing is available on Pro, Business, and Enterprise plans with different feature levels.

Symptoms

  • Traffic routes to unhealthy origin servers
  • Failover not triggering when origin fails
  • All traffic going to single pool despite multiple configured
  • Health checks marking healthy origins as unhealthy
  • Load Balancer dashboard shows pools as "Down" or "Critical"
  • Specific regions not receiving correct pool routing
  • Session affinity causing stale routing to old servers

Common Causes

  • Health check endpoint returning wrong status
  • Health check path configuration incorrect
  • Origin server not responding to health check requests
  • Health check frequency too low to detect failures quickly
  • Pool thresholds misconfigured (too strict or too lenient)
  • Geographic routing not matching expected regions
  • Session affinity keeping users on failed origins
  • Origin firewall blocking Cloudflare health check IPs

Step-by-Step Fix

  1. 1.Check Load Balancer status in dashboard:

Navigate to: Traffic > Load Balancing

bash
# View:
# - Monitor status (active/inactive)
# - Pool status (Healthy/Unhealthy/Critical)
# - Origin server status within pools
# - Traffic distribution metrics
  1. 1.Examine individual pool health:

```bash # Via API - check pool health curl -X GET "https://api.cloudflare.com/client/v4/user/load_balancers/pools" \ -H "Authorization: Bearer API_TOKEN"

# Check specific pool curl -X GET "https://api.cloudflare.com/client/v4/user/load_balancers/pools/POOL_ID/health" \ -H "Authorization: Bearer API_TOKEN" ```

  1. 1.Verify health check endpoint:

```bash # Test health check path on each origin curl -I http://ORIGIN_IP/health -H "Host: yourdomain.com" curl -I https://ORIGIN_IP/health -H "Host: yourdomain.com" -k

# Should return expected status (usually 200 OK) # If returns different status, health check fails ```

  1. 1.Check health check configuration:

Navigate to: Traffic > Load Balancing > Monitors

``` # Verify monitor settings: # - Path: health endpoint path (/health, /ping, etc.) # - Expected code: usually 200 # - Timeout: health check timeout # - Interval: how often to check # - Method: GET, HEAD, etc.

# Common issues: # - Path returns 404 (endpoint doesn't exist) # - Expected code mismatch (expecting 200, gets 301) # - Timeout too short for slow endpoints ```

  1. 1.Create proper health check endpoint:

```nginx # nginx: Add health check endpoint location /health { return 200 "OK"; add_header Content-Type text/plain; }

# Or return JSON with status location /health { default_type application/json; return 200 '{"status":"healthy"}'; } ```

  1. 1.Verify origin firewall allows health checks:

```bash # Health checks come from Cloudflare IPs # Ensure firewall allows Cloudflare IP ranges

curl https://www.cloudflare.com/ips-v4 -o /tmp/cf-ips.txt

# Whitelist in firewall or security groups ```

  1. 1.Adjust pool threshold settings:

```bash # Via API - update pool thresholds curl -X PATCH "https://api.cloudflare.com/client/v4/user/load_balancers/pools/POOL_ID" \ -H "Authorization: Bearer API_TOKEN" \ -H "Content-Type: application/json" \ --data '{ "minimum_origins": 1, "notification_email": "admin@yourdomain.com", "check_regions": ["WEU", "NA"] }'

# minimum_origins: minimum healthy servers for pool to be "up" # Lower threshold for more resilience (1 = pool up with single healthy origin) ```

  1. 1.Check geographic routing configuration:

Navigate to: Traffic > Load Balancing > Load Balancers

``` # For geo-steering policies: # - Each pool assigned to specific regions # - Check region mapping correct

# Regions: # - WEU: Western Europe # - EEU: Eastern Europe # - NA: North America # - SA: South America # - OC: Oceania # - AF: Africa # - ME: Middle East # - APAC: Asia Pacific ```

  1. 1.Verify failover order:

```bash # Check pool priority order curl -X GET "https://api.cloudflare.com/client/v4/zones/ZONE_ID/load_balancers" \ -H "Authorization: Bearer API_TOKEN"

# pools array shows priority order: # First pool = primary, subsequent = fallback # Traffic goes to first healthy pool ```

  1. 1.Reset session affinity if stale:

```bash # Session affinity can pin users to specific origin # Even after origin becomes unhealthy

# Clear session affinity: # Dashboard > Traffic > Load Balancers > Session Affinity # Set to None temporarily to reset sticky sessions

# Or via API: curl -X PATCH "https://api.cloudflare.com/client/v4/zones/ZONE_ID/load_balancers/LB_ID" \ -H "Authorization: Bearer API_TOKEN" \ -H "Content-Type: application/json" \ --data '{"session_affinity": "none"}' ```

  1. 1.Debug health check failures:

```bash # Enable health check logging on origin # nginx: error_log /var/log/nginx/error.log debug;

# Watch for incoming health checks tail -f /var/log/nginx/access.log | grep health

# Should see regular requests from Cloudflare IPs ```

  1. 1.Test failover behavior manually:

```bash # Temporarily disable primary origin # Watch Load Balancer dashboard for pool status change

# Or via API - set origin to disabled: curl -X PATCH "https://api.cloudflare.com/client/v4/user/load_balancers/pools/POOL_ID/origins/ORIGIN_ID" \ -H "Authorization: Bearer API_TOKEN" \ -H "Content-Type: application/json" \ --data '{"enabled": false}'

# Verify traffic routes to fallback pool ```

  1. 1.Check DNS for load balanced hostname:

```bash # Verify load balanced hostname resolution dig lb.yourdomain.com +short

# Should resolve to Cloudflare IPs (proxied) # Load Balancer routes within Cloudflare network ```

Verification

After applying fixes:

  1. 1.Load Balancer dashboard shows pools as "Healthy"
  2. 2.Health check endpoint returns expected status (200)
  3. 3.Traffic distributes across multiple origins correctly
  4. 4.Failover triggers when primary pool becomes unhealthy
  5. 5.Geographic routing sends regions to correct pools
  6. 6.Health checks visible in origin server logs