Introduction

Envoy cluster configuration fails when cluster definition has invalid endpoints or DNS. This guide provides step-by-step diagnosis and resolution with specific commands and configuration examples.

Symptoms

Typical symptoms and error messages when this issue occurs:

bash
cluster_manager: error adding cluster
backing cluster "my-service" not found
EDS host lookup failed

Observable indicators: - Service mesh proxy logs show configuration errors - Control plane reports validation failures - Traffic routing does not match expected behavior

Common Causes

  1. 1.Envoy proxy issues are commonly caused by:
  2. 2.Static configuration syntax errors
  3. 3.Dynamic configuration from control plane issues
  4. 4.Resource limits too low for traffic load
  5. 5.Network connectivity to upstream services

Step-by-Step Fix

Step 1: Check Current State

bash
curl -s localhost:15000/config_dump | jq

Step 2: Identify Root Cause

bash
curl -s localhost:15000/clusters

Step 3: Apply Primary Fix

```yaml # Check Envoy cluster configuration kubectl exec <pod-name> -c istio-proxy -- curl -s localhost:15000/clusters

# Verify cluster endpoints kubectl exec <pod-name> -c istio-proxy -- curl -s localhost:15000/config_dump | jq '.configs[1].dynamic_active_clusters[]'

# Check EDS updates istioctl proxy-config endpoint <pod-name> ```

Apply this configuration:

bash
kubectl apply -f virtualservice.yaml

Step 4: Apply Alternative Fix (If Needed)

```bash # Verify configuration istioctl analyze

# Check proxy status istioctl proxy-status

# View effective configuration istioctl proxy-config all <pod-name> ```

Step 5: Verify the Fix

After applying the fix, verify with:

bash
kubectl exec <pod> -c istio-proxy -- curl -s localhost:15000/ready && curl -v http://<gateway>/test

Expected output should show healthy proxies and correct routing.

Common Pitfalls

  • Static and dynamic configuration conflicts
  • Listener address conflicts
  • Cluster endpoint DNS resolution failures
  • Health check path mismatch

Best Practices

  • Use dynamic configuration via xDS
  • Configure appropriate health checks
  • Set resource limits for Envoy containers
  • Enable hot restart for updates
  • Envoy Listener Bind Failed
  • Envoy Health Check Failing
  • Envoy Upstream Connection Refused
  • Envoy Rate Limit Not Enforced