Your pods aren't being scheduled on certain nodes, and you suspect taints are blocking them. Taints and tolerations work together to repel pods from nodes unless pods explicitly tolerate the taint. This mechanism is useful for dedicating nodes to specific workloads, but misconfiguration can prevent pods from scheduling.
Understanding Taints and Tolerations
Taints are applied to nodes and consist of key, value, and effect. Tolerations are applied to pods and allow them to schedule on nodes with matching taints. Without matching toleration, a pod won't schedule on a tainted node (depending on the taint effect).
Taint effects: - NoSchedule: Pod won't schedule (existing pods unaffected) - NoExecute: Pod won't schedule and existing pods are evicted - PreferNoSchedule: Scheduler tries to avoid, but not guaranteed
Diagnosis Commands
Check node taints:
```bash # List all nodes with taints kubectl get nodes -o custom-columns='NAME:.metadata.name,TAINTS:.spec.taints'
# Check specific node taints kubectl describe node node-name | grep -A 5 Taints
# Get detailed taint information kubectl get node node-name -o jsonpath='{.spec.taints}' ```
Check pod tolerations:
```bash # Check pod tolerations kubectl get pod pod-name -n namespace -o yaml | grep -A 20 tolerations
# Get tolerations in JSON kubectl get pod pod-name -n namespace -o jsonpath='{.spec.tolerations}'
# List pods with tolerations kubectl get pods -n namespace -o custom-columns='NAME:.metadata.name,TOLERATIONS:.spec.tolerations' ```
Check scheduling events:
# Check events for taint-related scheduling failures
kubectl describe pod pending-pod -n namespace | grep -A 10 "Events:"
kubectl get events -n namespace | grep -i "taint\|node(s) had taint"Common Solutions
Solution 1: Add Tolerations to Pods
Pods need tolerations to schedule on tainted nodes:
# Check node taint
kubectl describe node node-name | grep Taints
# Example: dedicated=gpu:NoScheduleAdd matching toleration:
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
containers:
- name: app
image: myimageMultiple tolerations:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 300 # Stay for 5 minutes when node not readySolution 2: Remove Node Taints
Remove taints that are blocking normal pods:
```bash # Remove specific taint kubectl taint nodes node-name key:value:effect-
# Example: Remove GPU taint kubectl taint nodes node-name dedicated=gpu:NoSchedule-
# Remove all taints with a key kubectl taint nodes node-name dedicated- ```
Check taint removal:
# Verify taint is removed
kubectl describe node node-name | grep TaintsSolution 3: Use Exists Operator
Exists operator matches any taint with the key:
tolerations:
- key: "dedicated"
operator: "Exists"
effect: "NoSchedule"
# This tolerates dedicated=anything:NoScheduleTolerate all taints (dangerous):
tolerations:
- operator: "Exists"
# Matches all taints - pod can schedule anywhereSolution 4: Fix Toleration Effect Mismatch
Toleration effect must match taint effect:
```yaml # Node taint: dedicated=gpu:NoExecute # Wrong toleration: tolerations: - key: "dedicated" operator: "Equal" value: "gpu" effect: "NoSchedule" # Doesn't match NoExecute
# Correct toleration: tolerations: - key: "dedicated" operator: "Equal" value: "gpu" effect: "NoExecute" ```
Tolerate multiple effects:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoExecute"Solution 5: Handle Built-in Taints
Kubernetes adds built-in taints for node conditions:
# Common built-in taints:
# node.kubernetes.io/not-ready: NoExecute
# node.kubernetes.io/unreachable: NoExecute
# node.kubernetes.io/memory-pressure: NoSchedule
# node.kubernetes.io/disk-pressure: NoSchedule
# node.kubernetes.io/pid-pressure: NoSchedule
# node.kubernetes.io/network-unavailable: NoSchedule
# node.kubernetes.io/unschedulable: NoScheduleAdd tolerations for built-in taints:
tolerations:
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 300 # Stay 5 minutes when node not ready
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 300
- key: "node.kubernetes.io/memory-pressure"
operator: "Exists"
effect: "NoSchedule"DaemonSet pods should tolerate all built-in taints:
# DaemonSet tolerations for system components
tolerations:
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
- key: "node.kubernetes.io/disk-pressure"
operator: "Exists"
- key: "node.kubernetes.io/memory-pressure"
operator: "Exists"
- key: "node.kubernetes.io/pid-pressure"
operator: "Exists"
- key: "node.kubernetes.io/unschedulable"
operator: "Exists"
effect: "NoSchedule" # Or omit for all effectsSolution 6: Add Node Taints
Add taints to dedicate nodes:
```bash # Add taint to node kubectl taint nodes node-name key=value:effect
# Example: Dedicate node for GPU workloads kubectl taint nodes gpu-node dedicated=gpu:NoSchedule
# Example: Reserve node for monitoring kubectl taint nodes monitor-node purpose=monitoring:NoSchedule ```
Solution 7: Use TolerationSeconds
TolerationSeconds limits how long a pod stays on tainted node:
tolerations:
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 300 # Pod evicted after 5 minutesNo tolerationSeconds means forever:
tolerations:
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute"
# Pod stays indefinitely when node not readySolution 8: Fix Control Plane Node Access
Control plane nodes have taints by default:
# Check control plane taints
kubectl describe node control-plane-node | grep Taints
# Usually: node-role.kubernetes.io/control-plane:NoScheduleAdd toleration for control plane:
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule" # Legacy taint nameRemove control plane taint (if needed):
# Remove control plane taint to allow all pods
kubectl taint nodes control-plane-node node-role.kubernetes.io/control-plane:NoSchedule-
kubectl taint nodes control-plane-node node-role.kubernetes.io/master:NoSchedule-Solution 9: Debug Scheduling Failure
Check why pod isn't scheduling:
```bash # Describe pending pod kubectl describe pod pending-pod -n namespace
# Look for taint-related messages kubectl get events -n namespace | grep -i "taint|node(s) had taint"
# Check scheduler logs kubectl logs -n kube-system kube-scheduler-master | grep -i taint ```
Solution 10: Check All Nodes' Taints
Comprehensive taint check:
```bash # List all taints in cluster kubectl get nodes -o json | jq '.items[] | {name: .metadata.name, taints: .spec.taints}'
# Or without jq for node in $(kubectl get nodes -o name); do echo "$node:" kubectl get $node -o jsonpath='{.spec.taints}' echo "" done ```
Verification
After fixing taint/toleration issues:
```bash # Verify node taints kubectl describe node node-name | grep Taints
# Verify pod tolerations kubectl get pod pod-name -n namespace -o yaml | grep -A 20 tolerations
# Check pod scheduled on expected node kubectl get pod pod-name -n namespace -o wide
# Verify pod is running kubectl get pods -n namespace ```
Taint and Toleration Quick Reference
Common Taint Patterns
| Taint | Purpose | Typical Toleration |
|---|---|---|
node-role.kubernetes.io/control-plane:NoSchedule | Protect control plane | System components only |
dedicated=group:NoSchedule | Dedicated node pool | Pods for that group |
special-hardware=gpu:NoSchedule | GPU nodes | GPU workloads |
node.kubernetes.io/not-ready:NoExecute | Node unhealthy | Critical pods with tolerationSeconds |
node.kubernetes.io/memory-pressure:NoSchedule | Memory pressure | Critical pods only |
Toleration Operator Patterns
| Operator | Matches | Example |
|---|---|---|
| Equal | Exact key=value:effect | key=x, value=y, effect=NoSchedule |
| Exists | Any value with key+effect | key=x, effect=NoSchedule |
| Exists (no key) | All taints | operator: Exists |
Taint/Toleration Issues Summary
| Issue | Check | Solution |
|---|---|---|
| Pod can't schedule on node | kubectl describe pod | Add matching toleration |
| Effect mismatch | kubectl describe pod/node | Match effect exactly |
| Key mismatch | kubectl describe pod/node | Match key exactly |
| Control plane blocked | kubectl describe node | Add control-plane toleration |
| Built-in taint blocking | Check node conditions | Add built-in tolerations |
| Pod evicted (NoExecute) | kubectl get events | Check tolerationSeconds |
| All pods blocked on node | kubectl describe node | Remove node taint |
Prevention Best Practices
Document all taints and their purpose. Add appropriate tolerations to pods that need specific nodes. Use tolerationSeconds for critical pods on unhealthy nodes. Be careful with operator: Exists without key. Test taint changes before applying to production. Monitor pods pending due to taints. Keep control plane taints unless explicitly needed otherwise.
Taint and toleration issues are straightforward once you see the exact taint on the node - add a matching toleration to your pod, or remove the taint if it's blocking pods that should schedule there.