Introduction Canary deployments in service mesh allow gradual traffic shifting to new versions. When the canary version has issues, users receiving canary traffic experience errors.

Symptoms - Error rate spikes during canary traffic shift - Users randomly seeing errors from new version - Canary metrics showing increased latency or errors - Traffic not shifting according to configured weights - Rollback not working as expected

Common Causes - New version has bugs not caught in staging - Canary weight increased too quickly - Health checks not detecting canary issues - Traffic shifting not respecting pod readiness - Database schema changes incompatible with old version

Step-by-Step Fix 1. **Check canary traffic configuration': ```yaml apiVersion: networking.istio.io/v1beta1 kind: VirtualService metadata: name: my-service spec: http: - route: - destination: host: my-service subset: v1 weight: 90 - destination: host: my-service subset: v2 weight: 10 ```

  1. 1.**Rollback canary immediately':
  2. 2.```bash
  3. 3.# Set all traffic back to stable version
  4. 4.kubectl apply -f virtualservice-stable-only.yaml
  5. 5.`

Prevention - Start with 1% canary traffic and increase gradually - Monitor canary metrics at each traffic shift step - Set up automated canary analysis - Implement automatic rollback on error rate threshold - Test rollback procedures regularly