Introduction Canary deployments in service mesh allow gradual traffic shifting to new versions. When the canary version has issues, users receiving canary traffic experience errors.
Symptoms - Error rate spikes during canary traffic shift - Users randomly seeing errors from new version - Canary metrics showing increased latency or errors - Traffic not shifting according to configured weights - Rollback not working as expected
Common Causes - New version has bugs not caught in staging - Canary weight increased too quickly - Health checks not detecting canary issues - Traffic shifting not respecting pod readiness - Database schema changes incompatible with old version
Step-by-Step Fix 1. **Check canary traffic configuration': ```yaml apiVersion: networking.istio.io/v1beta1 kind: VirtualService metadata: name: my-service spec: http: - route: - destination: host: my-service subset: v1 weight: 90 - destination: host: my-service subset: v2 weight: 10 ```
- 1.**Rollback canary immediately':
- 2.```bash
- 3.# Set all traffic back to stable version
- 4.kubectl apply -f virtualservice-stable-only.yaml
- 5.
`