Fix Linkerd Traffic Split Error

Introduction

Linkerd traffic splits enable gradual rollout and canary deployments by distributing traffic between service versions. Traffic split errors can result from misconfigured weights, missing appliers, service profile issues, or service discovery problems. When traffic splits fail, traffic may not be distributed correctly, or all traffic may go to one service version unexpectedly.

Symptoms

Error messages in Linkerd logs:

bash

Traffic split: no valid backend found
Service profile not found for traffic split
Traffic split applier not configured
Invalid weight configuration
Backend service not found

Observable indicators: - Traffic not distributed according to weights - Canary deployment not receiving traffic - linkerd viz stat shows wrong distribution - Traffic split validation errors - All traffic going to primary service - Service profile errors in Linkerd control plane

Common Causes

1.Weight configuration error - Invalid weight values or sum > 100
2.Backend service missing - Service not registered or healthy
3.Missing applier service - Traffic split doesn't specify target
4.Service profile conflict - Profile interfering with traffic split
5.Service discovery issues - Backend not resolvable
6.Linkerd proxy issues - Sidecar not injected or crashed
7.Namespace mismatch - Traffic split and services in different namespaces

Step-by-Step Fix

Step 1: Check Traffic Split Configuration

```bash # List traffic splits kubectl get trafficsplits -A

# Describe traffic split kubectl describe trafficsplit my-split -n production

# Get full configuration kubectl get trafficsplit my-split -n production -o yaml

# Check weights kubectl get trafficsplit my-split -n production -o yamlpath='{.spec.backends}' ```

Step 2: Verify Service Health

```bash # Check backend services exist kubectl get svc -n production

# Verify service endpoints kubectl get endpoints my-service-v1 -n production kubectl get endpoints my-service-v2 -n production

# Check pods have Linkerd proxy kubectl get pods -n production -o yamlpath='{.spec.template.metadata.annotations.linkerd.io/inject}'

# Verify proxy is running kubectl get pods -n production -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{range .spec.containers[*]}{.name}{"\n"}{end}{end}' ```

Step 3: Check Traffic Distribution

```bash # View traffic stats linkerd viz stat deploy -n production

# Check route distribution linkerd viz routes deploy/my-service -n production

# Monitor traffic split effectiveness linkerd viz trafficsplit -n production

# Check tap for live traffic linkerd viz tap deploy/my-service -n production ```

Step 4: Fix Weight Configuration

```yaml # Correct TrafficSplit with valid weights apiVersion: split.smi-spec.io/v1alpha1 kind: TrafficSplit metadata: name: my-service-split namespace: production spec: service: my-service # The applier service (virtual service name) backends: - service: my-service-v1 weight: 90 # Must be positive integer - service: my-service-v2 weight: 10 # Total should be meaningful proportion

# Note: SMI spec doesn't require weights to sum to 100 # Traffic is distributed proportionally ```

```bash # Apply corrected configuration kubectl apply -f trafficsplit.yaml

# Validate traffic split kubectl auth can-i update trafficsplits --as=system:serviceaccount:linkerd:linkerd-proxy-injector ```

Step 5: Fix Missing Applier Service

yaml

# TrafficSplit with proper applier
apiVersion: split.smi-spec.io/v1alpha1
kind: TrafficSplit
metadata:
  name: my-service-split
spec:
  service: my-service    # This is the "applier" - the service clients call
  backends:
    - service: my-service-v1  # Backend implementations
      weight: 50
    - service: my-service-v2
      weight: 50

```bash # Verify applier service exists kubectl get service my-service -n production

# If missing, create it (or use existing service) kubectl create service clusterip my-service --tcp=80:8080 -n production ```

Step 6: Check Service Profiles

```bash # List service profiles kubectl get serviceprofiles -n production

# Check profile for traffic split service kubectl get serviceprofile my-service.production.svc.cluster.local -n production -o yaml

# Check if profile interferes with routing linkerd viz routes deploy/my-service -n production ```

yaml

# Service profile for traffic split service
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: my-service.production.svc.cluster.local
  namespace: production
spec:
  routes:
    - name: GET /
      condition:
        method: GET
        pathRegex: /
      timeout: 30s

Step 7: Verify Proxy Injection

```bash # Check if proxy is injected kubectl get pods -n production -o yamlpath='{range .items[*]}{.metadata.name}: {.metadata.annotations.linkerd.io/inject}{"\n"}{end}'

# Check proxy status linkerd check --proxy -n production

# Verify proxy is healthy kubectl logs deployment/my-service-v1 -n production -c linkerd-proxy ```

Step 8: Check Namespace Configuration

yaml

# Ensure TrafficSplit and services in same namespace
apiVersion: split.smi-spec.io/v1alpha1
kind: TrafficSplit
metadata:
  name: my-split
  namespace: production    # Same namespace as backend services
spec:
  service: my-service
  backends:
    - service: my-service-v1
      weight: 90
    - service: my-service-v2
      weight: 10

bash

# Cross-namespace traffic splits need FQN
spec:
  service: my-service
  backends:
    - service: my-service-v1.staging.svc.cluster.local
      weight: 50

Step 9: Verify the Fix

```bash # Test traffic distribution for i in {1..100}; do curl -s http://my-service.production/api/version done | sort | uniq -c

# Monitor with viz stat linkerd viz stat trafficsplit/my-service-split -n production

# Tap to verify routing linkerd viz tap deploy/my-service -n production --to deploy/my-service-v2

# Check for errors linkerd viz stat deploy -n production --from deploy/my-service-v1 ```

Advanced Diagnosis

Debug Linkerd Control Plane

```bash # Check Linkerd control plane linkerd check

# Check specific component linkerd check --proxy

# View control plane logs kubectl logs -n linkerd deployment/linkerd-destination

# Check destination service kubectl logs -n linkerd deployment/linkerd-destination -f ```

Check Proxy Configuration

```bash # Get proxy config kubectl exec deployment/my-service-v1 -c linkerd-proxy -n production -- curl localhost:4191/config

# Check proxy metrics kubectl exec deployment/my-service-v1 -c linkerd-proxy -n production -- curl localhost:4191/metrics | grep route

# View proxy admin interface kubectl exec deployment/my-service-v1 -c linkerd-proxy -n production -- curl localhost:4191/ready ```

Test Traffic Split Directly

```bash # Use linkerd inject to ensure proxy kubectl get deploy my-service-v1 -n production -o yaml | linkerd inject - | kubectl apply -f -

# Force traffic to specific backend for testing linkerd viz tap deploy/my-service -n production --to namespace/production ```

Check SMI Resources

```bash # Verify SMI resources exist kubectl get trafficsplits.split.smi-spec.io -A

# Check traffic target (if using SMI access) kubectl get traffictargets.access.smi-spec.io -A

# Verify HTTP route group kubectl get httproutegroups.specs.smi-spec.io -A ```

Common Pitfalls

Weights don't need to sum to 100 - Traffic split uses proportions
Missing applier service - Need virtual service for clients to call
Backend not meshed - Service without Linkerd proxy injected
Namespace mismatch - Traffic split can't find cross-namespace services
Service profile timeout - Profile timeout too short for backend
Old SMI spec version - Using deprecated API version
Proxy not ready - Sidecar still initializing

Best Practices

```yaml # Complete canary deployment setup # Step 1: Deploy service versions apiVersion: apps/v1 kind: Deployment metadata: name: my-service-v1 namespace: production spec: replicas: 3 selector: matchLabels: app: my-service version: v1 template: metadata: annotations: linkerd.io/inject: enabled labels: app: my-service version: v1 spec: containers: - name: app image: my-service:v1 ports: - containerPort: 8080

--- apiVersion: v1 kind: Service metadata: name: my-service-v1 namespace: production spec: selector: app: my-service version: v1 ports: - port: 80 targetPort: 8080

--- # Step 2: Create TrafficSplit apiVersion: split.smi-spec.io/v1alpha1 kind: TrafficSplit metadata: name: my-service-canary namespace: production spec: service: my-service backends: - service: my-service-v1 weight: 100 - service: my-service-v2 weight: 0 # Start with 0 for gradual rollout

--- # Step 3: Service Profile for timeouts apiVersion: linkerd.io/v1alpha2 kind: ServiceProfile metadata: name: my-service.production.svc.cluster.local namespace: production spec: routes: - name: health condition: method: GET pathRegex: /health timeout: 5s isRetryable: true - name: api condition: method: GET pathRegex: /api/.* timeout: 30s ```

```bash # Gradual traffic shift kubectl patch trafficsplit my-service-canary -n production --type='json' -p='[ {"op": "replace", "path": "/spec/backends/0/weight", "value": 95}, {"op": "replace", "path": "/spec/backends/1/weight", "value": 5} ]'

# Monitor during rollout linkerd viz stat trafficsplit -n production ```

Linkerd Traffic Split Not Routing
Istio Destination Rule Error
Envoy Cluster Unhealthy
Consul Connect Proxy Error

Linkerd Traffic Split Error

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Step 1: Check Traffic Split Configuration

Step 2: Verify Service Health

Step 3: Check Traffic Distribution

Step 4: Fix Weight Configuration

Step 5: Fix Missing Applier Service

Step 6: Check Service Profiles

Step 7: Verify Proxy Injection

Step 8: Check Namespace Configuration

Step 9: Verify the Fix

Advanced Diagnosis

Debug Linkerd Control Plane

Check Proxy Configuration

Test Traffic Split Directly

Check SMI Resources

Common Pitfalls

Best Practices

Related Issues

Share this guide

More Load Balancer Troubleshooting Guides

Azure Front Door Routing Rule Not Matching

Azure Front Door Backend Unavailable

Azure Application Gateway SSL Certificate Missing

Azure Application Gateway WAF Blocks Legitimate

Azure Application Gateway 502

Azure Load Balancer Outbound Rule Not Working