Introduction
Istio mTLS STRICT mode connection failures occur when the sidecar proxy cannot complete mutual TLS handshake with the destination service. In STRICT mode, all traffic must be encrypted with mTLS, and connections without valid client certificates are rejected. This manifests as connection refused, TLS handshake errors, or HTTP 503 responses with UO (upstream overflow) or URX (retry exhausted) flags.
Symptoms
- HTTP 503 responses with
original_ip_discoveredorupstream_connect_failure - Envoy logs show
TLS handshake failedorpeer certificate validation error - Connection succeeds in PERMISSIVE mode but fails in STRICT mode
istiodlogs show certificate NACK (negative acknowledgment)- Issue appears after enabling STRICT mTLS, rotating root certificates, or deploying new services
Common Causes
- Sidecar not receiving or mounting Istio-managed certificates
- PeerAuthentication policy conflict between namespace and mesh-level policies
- AuthorizationPolicy explicitly denying traffic from specific principals
- Certificate chain expired or not trusted by destination sidecar
- Service account mismatch between source and destination policies
- Control plane (istiod) not issuing certificates due to RBAC issues
Step-by-Step Fix
### 1. Verify mTLS mode and policy scope
Check the effective mTLS policy for the destination service:
```bash # Check mesh-wide mTLS policy kubectl get peerauthentication -n istio-system default -o yaml
# Check namespace-level policy kubectl get peerauthentication -n <namespace> default -o yaml
# Check service-level policy (if exists) kubectl get peerauthentication -n <namespace> <service-name> -o yaml ```
- Policy precedence (most specific wins):
- Service-level PeerAuthentication (labels select specific workload)
- Namespace-level PeerAuthentication (in same namespace as workload)
- Mesh-level PeerAuthentication (in istio-system namespace)
Expected STRICT mode configuration:
yaml
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system # Mesh-wide
spec:
mtls:
mode: STRICT
If namespace has PERMISSIVE or DISABLE, it overrides mesh-wide STRICT:
yaml
# WRONG: Namespace policy overrides mesh-wide STRICT
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: <namespace> # Namespace-level
spec:
mtls:
mode: PERMISSIVE # Overrides mesh-wide policy!
### 2. Verify sidecar has valid certificates
Check if the sidecar proxy has received and mounted certificates:
```bash # Check certificate files in sidecar kubectl exec -n <namespace> <pod-name> -c istio-proxy -- ls -la /etc/certs/
# Expected files: # - cert-chain.pem (workload certificate) # - root-cert.pem (Istio CA root) # - key.pem (workload private key)
# Verify certificate validity kubectl exec -n <namespace> <pod-name> -c istio-proxy -- openssl x509 -in /etc/certs/cert-chain.pem -text -noout | grep -E "Not Before|Not After|Subject:"
# Check certificate trust chain kubectl exec -n <namespace> <pod-name> -c istio-proxy -- openssl verify -CAfile /etc/certs/root-cert.pem /etc/certs/cert-chain.pem ```
Expected output: OK from certificate verification, with expiration > 24 hours.
If certificates are missing, the sidecar did not complete CSR (Certificate Signing Request) with istiod.
### 3. Check istiod certificate issuance logs
Verify istiod is issuing certificates successfully:
```bash # Check istiod logs for CSR errors kubectl logs -n istio-system -l app=istiod --tail=200 | grep -E "CSR|certificate|NACK"
# Check certificate rotation kubectl logs -n istio-system -l app=istiod --tail=200 | grep "cert-manager"
# Verify istiod can reach Kubernetes API for SA token validation kubectl logs -n istio-system -l app=istiod --tail=200 | grep -E "RBAC|permission denied" ```
Common CSR failures: - Service account does not exist or has insufficient RBAC permissions - Cluster trust bundle not configured for custom CA - Certificate API rate limit exceeded
### 4. Verify AuthorizationPolicy is not denying traffic
Authorization policies can explicitly deny traffic even with valid mTLS:
```bash # List all authorization policies in namespace kubectl get authorizationpolicy -n <namespace>
# Check for DENY policies that might affect traffic kubectl get authorizationpolicy -n <namespace> -o yaml | grep -A20 "action: DENY"
# Check specific policy details kubectl get authorizationpolicy -n <namespace> <policy-name> -o yaml ```
Example DENY policy that blocks traffic:
yaml
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: deny-external
namespace: <namespace>
spec:
action: DENY
rules:
- from:
- source:
notNamespaces: ["<namespace>", "istio-system"]
To allow traffic, either:
- Add source namespace to notNamespaces list
- Create explicit ALLOW policy with higher priority
- Remove or modify the DENY policy
### 5. Check service account and principal configuration
mTLS uses service accounts for identity. Verify the source has correct identity:
```bash # Get source workload service account kubectl get serviceaccount <source-sa> -n <namespace> -o yaml
# Check destination policy expects correct principal kubectl get authorizationpolicy -n <namespace> <policy-name> -o yaml | grep -A5 "from:"
# Expected principal format principals: ["cluster.local/ns/<namespace>/sa/<service-account>"] ```
Verify source pod is using expected service account:
yaml
# Pod spec should reference service account
apiVersion: v1
kind: Pod
metadata:
name: <pod-name>
namespace: <namespace>
spec:
serviceAccountName: <service-account-name> # Must match policy
### 6. Test connectivity with explicit mTLS
Use istioctl to test mTLS connectivity:
```bash # Check mTLS status for all services istioctl proxy-config cluster <pod-name>.<namespace> --fqdn <service>.<namespace>.svc.cluster.local -o json
# Verify TLS context configuration istioctl proxy-config cluster <pod-name>.<namespace> --fqdn <service>.<namespace>.svc.cluster.local -o json | jq '.transportSocket.tlsContext'
# Test reachability from source pod istioctl proxy-config endpoint <pod-name>.<namespace> | grep <service> ```
Expected: TLS context shows validation_context with trusted CA and certificate chain.
### 7. Check certificate expiration and rotation
Expired certificates cause immediate connection failures:
```bash # Check all sidecar certificate expiration dates kubectl exec -n <namespace> <pod-name> -c istio-proxy -- sh -c 'openssl x509 -in /etc/certs/cert-chain.pem -noout -dates'
# Check root certificate expiration kubectl exec -n <namespace> <pod-name> -c istio-proxy -- sh -c 'openssl x509 -in /etc/certs/root-cert.pem -noout -dates'
# Check istiod root cert expiration kubectl get secret istio-ca-secret -n istio-system -o jsonpath='{.data.ca-cert\.pem}' | base64 -d | openssl x509 -noout -dates ```
Certificate lifetime expectations: - Workload certificates: 24 hours (default, rotated automatically) - Root CA certificate: Years (manual rotation) - If workload cert < 1 hour, sidecar may not have time to rotate
### 8. Verify DestinationRule TLS settings
DestinationRule can override mTLS behavior:
```bash # Check DestinationRule for the service kubectl get destinationrule -n <namespace> <rule-name> -o yaml
# Or check mesh-wide kubectl get destinationrule -n istio-system -o yaml ```
Expected TLS configuration for STRICT mode:
yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: <service-name>
namespace: <namespace>
spec:
host: <service>.<namespace>.svc.cluster.local
trafficPolicy:
tls:
mode: ISTIO_MUTUAL # Uses Istio-managed certificates
Incorrect modes that cause failures:
- DISABLE: Sends plaintext to STRICT mode endpoint (connection rejected)
- SIMPLE: Uses regular TLS without client cert (mTLS handshake fails)
- MUTUAL: Requires external certificate files (not Istio-managed)
### 9. Check for root certificate mismatch
If root CA was rotated, sidecars may have different trust bundles:
```bash # Get root cert from source sidecar kubectl exec -n <namespace> <source-pod> -c istio-proxy -- cat /etc/certs/root-cert.pem > /tmp/source-root.pem
# Get root cert from destination sidecar kubectl exec -n <namespace> <dest-pod> -c istio-proxy -- cat /etc/certs/root-cert.pem > /tmp/dest-root.pem
# Compare fingerprints openssl x509 -in /tmp/source-root.pem -noout -fingerprint openssl x509 -in /tmp/dest-root.pem -noout -fingerprint
# Fingerprints must match diff /tmp/source-root.pem /tmp/dest-root.pem ```
If certificates differ, restart sidecars to pick up new root cert:
bash
# Rolling restart to refresh certificates
kubectl rollout restart deployment -n <namespace> <deployment-name>
### 10. Enable Envoy access logging for TLS failures
Debug TLS handshake issues with detailed logs:
```bash # Enable debug logging for connection issues istioctl proxy-config log <pod-name>.<namespace> --level connection:debug
# Or set via Envoy admin API kubectl exec -n <namespace> <pod-name> -c istio-proxy -- curl -XPOST "http://localhost:15000/logging?connection=debug"
# Watch for TLS-specific errors kubectl logs -n <namespace> <pod-name> -c istio-proxy --tail=100 | grep -E "TLS|SSL|handshake|certificate" ```
Common TLS error messages:
- CERTIFICATE_VERIFY_FAILED: Root CA not trusted
- CERTIFICATE_HAS_EXPIRED: Certificate past validity period
- UNKNOWN_CA: Certificate signed by unknown authority
- NO_SHARED_CIPHER: TLS version or cipher suite mismatch
Prevention
- Deploy in PERMISSIVE mode first, verify all traffic flows, then switch to STRICT
- Monitor certificate expiration with Prometheus metrics (
istio_worker_certificate_expiry_seconds) - Set up alerts for CSR failure rate spikes
- Use
istioctl analyzebefore deploying policy changes - Document service account dependencies for each service
- Rotate root CA certificates during maintenance windows with sidecar restarts
Related Errors
- **503 Service Unavailable**: No healthy endpoints or circuit breaker open
- **Connection reset by peer**: TLS handshake rejected by destination
- **Certificate unknown**: Root CA not in trust bundle