Introduction
Linkerd traffic splits enable gradual rollout and canary deployments by distributing traffic between service versions. Traffic split errors can result from misconfigured weights, missing appliers, service profile issues, or service discovery problems. When traffic splits fail, traffic may not be distributed correctly, or all traffic may go to one service version unexpectedly.
Symptoms
Error messages in Linkerd logs:
Traffic split: no valid backend found
Service profile not found for traffic split
Traffic split applier not configured
Invalid weight configuration
Backend service not foundObservable indicators: - Traffic not distributed according to weights - Canary deployment not receiving traffic - linkerd viz stat shows wrong distribution - Traffic split validation errors - All traffic going to primary service - Service profile errors in Linkerd control plane
Common Causes
- 1.Weight configuration error - Invalid weight values or sum > 100
- 2.Backend service missing - Service not registered or healthy
- 3.Missing applier service - Traffic split doesn't specify target
- 4.Service profile conflict - Profile interfering with traffic split
- 5.Service discovery issues - Backend not resolvable
- 6.Linkerd proxy issues - Sidecar not injected or crashed
- 7.Namespace mismatch - Traffic split and services in different namespaces
Step-by-Step Fix
Step 1: Check Traffic Split Configuration
```bash # List traffic splits kubectl get trafficsplits -A
# Describe traffic split kubectl describe trafficsplit my-split -n production
# Get full configuration kubectl get trafficsplit my-split -n production -o yaml
# Check weights kubectl get trafficsplit my-split -n production -o yamlpath='{.spec.backends}' ```
Step 2: Verify Service Health
```bash # Check backend services exist kubectl get svc -n production
# Verify service endpoints kubectl get endpoints my-service-v1 -n production kubectl get endpoints my-service-v2 -n production
# Check pods have Linkerd proxy kubectl get pods -n production -o yamlpath='{.spec.template.metadata.annotations.linkerd.io/inject}'
# Verify proxy is running kubectl get pods -n production -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{range .spec.containers[*]}{.name}{"\n"}{end}{end}' ```
Step 3: Check Traffic Distribution
```bash # View traffic stats linkerd viz stat deploy -n production
# Check route distribution linkerd viz routes deploy/my-service -n production
# Monitor traffic split effectiveness linkerd viz trafficsplit -n production
# Check tap for live traffic linkerd viz tap deploy/my-service -n production ```
Step 4: Fix Weight Configuration
```yaml # Correct TrafficSplit with valid weights apiVersion: split.smi-spec.io/v1alpha1 kind: TrafficSplit metadata: name: my-service-split namespace: production spec: service: my-service # The applier service (virtual service name) backends: - service: my-service-v1 weight: 90 # Must be positive integer - service: my-service-v2 weight: 10 # Total should be meaningful proportion
# Note: SMI spec doesn't require weights to sum to 100 # Traffic is distributed proportionally ```
```bash # Apply corrected configuration kubectl apply -f trafficsplit.yaml
# Validate traffic split kubectl auth can-i update trafficsplits --as=system:serviceaccount:linkerd:linkerd-proxy-injector ```
Step 5: Fix Missing Applier Service
# TrafficSplit with proper applier
apiVersion: split.smi-spec.io/v1alpha1
kind: TrafficSplit
metadata:
name: my-service-split
spec:
service: my-service # This is the "applier" - the service clients call
backends:
- service: my-service-v1 # Backend implementations
weight: 50
- service: my-service-v2
weight: 50```bash # Verify applier service exists kubectl get service my-service -n production
# If missing, create it (or use existing service) kubectl create service clusterip my-service --tcp=80:8080 -n production ```
Step 6: Check Service Profiles
```bash # List service profiles kubectl get serviceprofiles -n production
# Check profile for traffic split service kubectl get serviceprofile my-service.production.svc.cluster.local -n production -o yaml
# Check if profile interferes with routing linkerd viz routes deploy/my-service -n production ```
# Service profile for traffic split service
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
name: my-service.production.svc.cluster.local
namespace: production
spec:
routes:
- name: GET /
condition:
method: GET
pathRegex: /
timeout: 30sStep 7: Verify Proxy Injection
```bash # Check if proxy is injected kubectl get pods -n production -o yamlpath='{range .items[*]}{.metadata.name}: {.metadata.annotations.linkerd.io/inject}{"\n"}{end}'
# Check proxy status linkerd check --proxy -n production
# Verify proxy is healthy kubectl logs deployment/my-service-v1 -n production -c linkerd-proxy ```
Step 8: Check Namespace Configuration
# Ensure TrafficSplit and services in same namespace
apiVersion: split.smi-spec.io/v1alpha1
kind: TrafficSplit
metadata:
name: my-split
namespace: production # Same namespace as backend services
spec:
service: my-service
backends:
- service: my-service-v1
weight: 90
- service: my-service-v2
weight: 10# Cross-namespace traffic splits need FQN
spec:
service: my-service
backends:
- service: my-service-v1.staging.svc.cluster.local
weight: 50Step 9: Verify the Fix
```bash # Test traffic distribution for i in {1..100}; do curl -s http://my-service.production/api/version done | sort | uniq -c
# Monitor with viz stat linkerd viz stat trafficsplit/my-service-split -n production
# Tap to verify routing linkerd viz tap deploy/my-service -n production --to deploy/my-service-v2
# Check for errors linkerd viz stat deploy -n production --from deploy/my-service-v1 ```
Advanced Diagnosis
Debug Linkerd Control Plane
```bash # Check Linkerd control plane linkerd check
# Check specific component linkerd check --proxy
# View control plane logs kubectl logs -n linkerd deployment/linkerd-destination
# Check destination service kubectl logs -n linkerd deployment/linkerd-destination -f ```
Check Proxy Configuration
```bash # Get proxy config kubectl exec deployment/my-service-v1 -c linkerd-proxy -n production -- curl localhost:4191/config
# Check proxy metrics kubectl exec deployment/my-service-v1 -c linkerd-proxy -n production -- curl localhost:4191/metrics | grep route
# View proxy admin interface kubectl exec deployment/my-service-v1 -c linkerd-proxy -n production -- curl localhost:4191/ready ```
Test Traffic Split Directly
```bash # Use linkerd inject to ensure proxy kubectl get deploy my-service-v1 -n production -o yaml | linkerd inject - | kubectl apply -f -
# Force traffic to specific backend for testing linkerd viz tap deploy/my-service -n production --to namespace/production ```
Check SMI Resources
```bash # Verify SMI resources exist kubectl get trafficsplits.split.smi-spec.io -A
# Check traffic target (if using SMI access) kubectl get traffictargets.access.smi-spec.io -A
# Verify HTTP route group kubectl get httproutegroups.specs.smi-spec.io -A ```
Common Pitfalls
- Weights don't need to sum to 100 - Traffic split uses proportions
- Missing applier service - Need virtual service for clients to call
- Backend not meshed - Service without Linkerd proxy injected
- Namespace mismatch - Traffic split can't find cross-namespace services
- Service profile timeout - Profile timeout too short for backend
- Old SMI spec version - Using deprecated API version
- Proxy not ready - Sidecar still initializing
Best Practices
```yaml # Complete canary deployment setup # Step 1: Deploy service versions apiVersion: apps/v1 kind: Deployment metadata: name: my-service-v1 namespace: production spec: replicas: 3 selector: matchLabels: app: my-service version: v1 template: metadata: annotations: linkerd.io/inject: enabled labels: app: my-service version: v1 spec: containers: - name: app image: my-service:v1 ports: - containerPort: 8080
--- apiVersion: v1 kind: Service metadata: name: my-service-v1 namespace: production spec: selector: app: my-service version: v1 ports: - port: 80 targetPort: 8080
--- # Step 2: Create TrafficSplit apiVersion: split.smi-spec.io/v1alpha1 kind: TrafficSplit metadata: name: my-service-canary namespace: production spec: service: my-service backends: - service: my-service-v1 weight: 100 - service: my-service-v2 weight: 0 # Start with 0 for gradual rollout
--- # Step 3: Service Profile for timeouts apiVersion: linkerd.io/v1alpha2 kind: ServiceProfile metadata: name: my-service.production.svc.cluster.local namespace: production spec: routes: - name: health condition: method: GET pathRegex: /health timeout: 5s isRetryable: true - name: api condition: method: GET pathRegex: /api/.* timeout: 30s ```
```bash # Gradual traffic shift kubectl patch trafficsplit my-service-canary -n production --type='json' -p='[ {"op": "replace", "path": "/spec/backends/0/weight", "value": 95}, {"op": "replace", "path": "/spec/backends/1/weight", "value": 5} ]'
# Monitor during rollout linkerd viz stat trafficsplit -n production ```
Related Issues
- Linkerd Traffic Split Not Routing
- Istio Destination Rule Error
- Envoy Cluster Unhealthy
- Consul Connect Proxy Error