Introduction

Validating admission webhooks sit directly in the request path for Kubernetes creates and updates. That means a broken webhook can turn a local service problem into a cluster-wide control-plane incident. If the webhook is unreachable, too slow, or scoped too broadly with failurePolicy: Fail, normal deployment and recovery operations can grind to a halt.

Symptoms

  • kubectl apply or kubectl create fails with webhook call errors
  • Many unrelated resources across namespaces are rejected
  • Existing workloads continue running, but new changes cannot be applied
  • Operators see timeout or TLS errors pointing at the webhook service

Common Causes

  • The webhook Service or backing Pods are down
  • failurePolicy: Fail blocks requests while the webhook is unhealthy
  • Timeout settings are too short for normal webhook response time
  • The webhook matches more namespaces or resource types than intended

Step-by-Step Fix

  1. 1.Inspect the ValidatingWebhookConfiguration
  2. 2.Confirm failure policy, timeout, and matching scope before changing anything blindly.
bash
kubectl get validatingwebhookconfigurations
kubectl describe validatingwebhookconfigurations my-webhook
  1. 1.Check webhook service and Pod health
  2. 2.If the webhook backend is unavailable, the admission layer cannot succeed consistently.
bash
kubectl get svc -n webhook-namespace
kubectl get pods -n webhook-namespace
  1. 1.Temporarily reduce blast radius if needed
  2. 2.In an outage, moving failurePolicy to Ignore or narrowing scope may be the fastest way to restore cluster operations while you fix the backend.
  3. 3.Validate webhook certificates and response timing
  4. 4.TLS errors and slow handlers are common root causes of “everything is blocked” incidents.

Prevention

  • Keep validating webhooks narrowly scoped
  • Use cautious failure policies during development and rollout
  • Monitor webhook latency and availability as production dependencies
  • Exclude critical system namespaces unless there is a strong reason not to