Your DaemonSet should place a pod on every node (or selected nodes), but when you check, some nodes are missing their pods. DaemonSets are essential for cluster-wide services like logging agents, monitoring collectors, and network plugins, but they can fail to schedule due to node taints, affinity constraints, or resource issues.
Understanding DaemonSet Scheduling
DaemonSets ensure a pod runs on each node that matches certain criteria. The DaemonSet controller creates pods for nodes, but actual scheduling is done by the default scheduler (or DaemonSet's own scheduling logic in older Kubernetes versions).
Nodes might be skipped because: they have taints the pod doesn't tolerate, the pod's node selector doesn't match node labels, node affinity rules exclude the node, or there aren't enough resources on the node.
Diagnosis Commands
Start by checking the DaemonSet and its pods:
```bash # Check DaemonSet status kubectl get daemonset daemonset-name -n namespace
# Check pods created by DaemonSet kubectl get pods -n namespace -l app=daemonset-label
# Count expected vs current pods kubectl describe daemonset daemonset-name -n namespace | grep "Desired Number of Nodes Scheduled" kubectl describe daemonset daemonset-name -n namespace | grep "Current Number of Nodes Scheduled" ```
Check which nodes have pods and which don't:
```bash # List all nodes kubectl get nodes
# Show pods per node kubectl get pods -n namespace -o wide --sort-by=.spec.nodeName
# Check which nodes have DaemonSet pods kubectl get pods -n namespace -l app=daemonset-label -o custom-columns='NODE:.spec.nodeName,NAME:.metadata.name'
# Find nodes without DaemonSet pods kubectl get nodes -o jsonpath='{.items[*].metadata.name}' | tr ' ' '\n' | while read node; do if ! kubectl get pods -n namespace --field-selector spec.nodeName=$node -l app=daemonset-label 2>/dev/null | grep -q daemonset; then echo "Missing pod on node: $node" fi done ```
Examine node details:
```bash # Check node taints kubectl describe node node-name | grep -A 5 Taints
# Check node labels kubectl get node node-name --show-labels
# Check node conditions kubectl describe node node-name | grep -A 10 Conditions
# Check node resources kubectl describe node node-name | grep -A 10 "Allocated resources" ```
Common Solutions
Solution 1: Fix Taints and Tolerations
Nodes with taints repel pods without matching tolerations:
```bash # Check node taints kubectl describe nodes | grep -A 2 Taints
# Common control plane taints: # node-role.kubernetes.io/control-plane:NoSchedule # node-role.kubernetes.io/master:NoSchedule ```
Add tolerations to DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: my-daemonset
spec:
template:
spec:
tolerations:
# Tolerate control plane nodes
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
# Tolerate other common taints
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
- key: node.kubernetes.io/disk-pressure
operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/memory-pressure
operator: Exists
effect: NoScheduleSolution 2: Fix Node Selector
The DaemonSet might have a node selector excluding nodes:
# Check DaemonSet node selector
kubectl get daemonset daemonset-name -n namespace -o jsonpath='{.spec.template.spec.nodeSelector}'Fix node selector:
```yaml # Remove restrictive node selector if needed spec: template: spec: nodeSelector: {} # Remove selector
# Or update to match your nodes spec: template: spec: nodeSelector: disktype: ssd # Ensure nodes have this label
# Add label to nodes kubectl label node node-name disktype=ssd ```
Solution 3: Fix Node Affinity
Node affinity might be excluding nodes:
# Check DaemonSet affinity rules
kubectl get daemonset daemonset-name -n namespace -o yaml | grep -A 30 affinityFix node affinity:
spec:
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
# Remove overly restrictive rulesSolution 4: Fix Resource Constraints
Nodes might not have resources for DaemonSet pods:
```bash # Check node available resources kubectl describe node node-name | grep -A 10 "Allocated resources"
# Check DaemonSet resource requests kubectl get daemonset daemonset-name -n namespace -o jsonpath='{.spec.template.spec.containers[*].resources.requests}' ```
Reduce resource requests:
spec:
template:
spec:
containers:
- name: agent
resources:
requests:
cpu: "50m" # Reduce if too high
memory: "64Mi"
limits:
cpu: "200m"
memory: "128Mi"Solution 5: Check for Pod Anti-Affinity
Anti-affinity might prevent scheduling on same node:
# If you have pod anti-affinity, it might block DaemonSet pods
spec:
template:
spec:
affinity:
podAntiAffinity:
# This would prevent multiple pods on same node
# DaemonSets shouldn't need pod anti-affinity
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: myapp
topologyKey: kubernetes.io/hostnameSolution 6: Fix HostPort Conflicts
HostPort conflicts prevent pod scheduling:
```bash # Check for HostPort in DaemonSet kubectl get daemonset daemonset-name -n namespace -o yaml | grep -A 5 hostPort
# Check if port is already used on node # SSH to node and check netstat -tuln | grep PORT ```
Fix hostPort configuration:
containers:
- name: agent
ports:
- containerPort: 8080
hostPort: 8080 # Remove if not needed, or change portSolution 7: Check Node Conditions
Nodes in bad condition might not get pods:
```bash # Check node conditions kubectl get nodes kubectl describe node node-name | grep -A 15 Conditions
# Common issues: # - Ready: False (node not healthy) # - NetworkUnavailable: True # - DiskPressure: True # - MemoryPressure: True ```
Fix node issues:
```bash # SSH to problematic node ssh node-name
# Check kubelet systemctl status kubelet journalctl -u kubelet -n 50
# Check node resources df -h # Disk free -m # Memory
# Restart kubelet if needed systemctl restart kubelet ```
Solution 8: Check DaemonSet Update Strategy
Rolling update might be stuck:
```bash # Check DaemonSet update strategy kubectl get daemonset daemonset-name -n namespace -o yaml | grep -A 10 updateStrategy
# Check rollout status kubectl rollout status daemonset/daemonset-name -n namespace ```
Fix update strategy:
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # How many pods can be unavailable during update
# Or use OnDelete for manual updates
# type: OnDeleteSolution 9: Verify PriorityClass
DaemonSet pods might have wrong priority:
spec:
template:
spec:
priorityClassName: system-node-critical # High priority for system DaemonSets
# Or: system-cluster-criticalVerification
After applying fixes:
```bash # Check DaemonSet pods are on all expected nodes kubectl get daemonset daemonset-name -n namespace
# Verify pod placement kubectl get pods -n namespace -l app=daemonset-label -o wide
# Check node coverage kubectl get nodes | wc -l kubectl get pods -n namespace -l app=daemonset-label | wc -l
# Monitor pod scheduling kubectl get pods -n namespace -l app=daemonset-label -w ```
DaemonSet Nodes Checklist
# Quick diagnostic script
for node in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do
echo "=== Node: $node ==="
echo "Taints:"
kubectl get node $node -o jsonpath='{.spec.taints}'
echo ""
echo "Labels (relevant):"
kubectl get node $node --show-labels | grep -E "disktype|zone|role"
echo ""
echo "Has DaemonSet pod:"
kubectl get pods -n namespace --field-selector spec.nodeName=$node -l app=daemonset-label
echo ""
doneDaemonSet Not Scheduling Causes Summary
| Cause | Check | Solution | |
|---|---|---|---|
| Node taints | `kubectl describe node | grep Taints` | Add tolerations |
| Node selector mismatch | kubectl get ds -o yaml | Fix selector or label nodes | |
| Node affinity too strict | kubectl get ds -o yaml | Relax affinity rules | |
| Insufficient resources | kubectl describe node | Reduce requests | |
| HostPort conflict | netstat -tuln on node | Change or remove hostPort | |
| Node NotReady | kubectl get nodes | Fix node issues | |
| Update stuck | kubectl rollout status | Force rollout or pause |
Prevention Best Practices
Add tolerations for control plane and common taints in system DaemonSets. Use appropriate priorityClassName for critical DaemonSets. Set reasonable resource requests that fit on all nodes. Avoid hostPort unless absolutely necessary. Monitor DaemonSet pod coverage with alerts. Use node affinity for targeting specific node pools, not exclusion.
DaemonSet scheduling issues are almost always about node compatibility - the pod requirements don't match what the nodes offer. Taints and tolerations are the most common culprit, especially for control plane nodes that have protective taints.