Introduction
Kubernetes pod pending and scheduling errors occur when the Kubernetes scheduler cannot find a suitable node to place a pod due to resource constraints, configuration mismatches, or cluster policy violations. The pod remains in Pending state indefinitely, blocking deployments and causing application downtime. Common causes include insufficient CPU/memory resources on nodes, resource quotas exceeded in namespace, node selectors/affinity rules matching no nodes, taints without corresponding tolerations, PersistentVolumeClaim binding failures, PodDisruptionBudget preventing eviction, pod priority/preemption blocking lower-priority pods, and ImagePullBackOff preventing container startup. The fix requires understanding the Kubernetes scheduling pipeline, resource management, node constraints, and debugging tools like kubectl describe, kubectl top, and scheduler logs. This guide provides production-proven troubleshooting for Kubernetes scheduling issues across single-node clusters to multi-node production deployments.
Symptoms
- Pod stuck in
Pendingstate indefinitely kubectl get podsshows0/1containers readykubectl describe podshowsFailedSchedulingeventInsufficient cpuorInsufficient memoryin events0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate0/3 nodes are available: 3 Insufficient memory- PersistentVolumeClaim stuck in
Pending exceeded quotaerrors in eventsnode(s) didn't match PodAffinity rulesnode(s) had volume node affinity conflict- Deployment shows
0/5 replicasavailable - HPA cannot scale due to resource exhaustion
Common Causes
- Cluster CPU/memory resources exhausted
- Resource requests too high for available nodes
- Namespace ResourceQuota limits reached
- LimitRange conflicts with pod requests
- Node selector labels don't match any nodes
- Node affinity rules too restrictive
- Taints on nodes without pod tolerations
- Pod anti-affinity preventing co-location
- PVC storage class not available
- PV already bound to different PVC
- Pod priority too low, preemption not working
- Scheduler not running or misconfigured
- Network plugin (CNI) not installed
- Image pull failures blocking scheduling
Step-by-Step Fix
### 1. Diagnose pod scheduling failure
Check pod status and events:
```bash # Get pod details kubectl get pod <pod-name> -n <namespace>
# Output: # NAME READY STATUS RESTARTS AGE # my-app-xyz 0/1 Pending 0 5m
# Describe pod for detailed events kubectl describe pod <pod-name> -n <namespace>
# Key sections to check: # Status: Pending # Conditions: # Type Status # PodScheduled False # # Events: # Type Reason Message # ---- ------ ------- # Warning FailedScheduling 0/3 nodes are available: 3 Insufficient memory. ```
Check scheduler events:
```bash # Get recent scheduling events kubectl get events -n <namespace> --field-selector reason=FailedScheduling
# Watch events in real-time kubectl get events -n <namespace> -w --field-selector reason=FailedScheduling
# Check scheduler logs (control plane) kubectl logs -n kube-system -l component=kube-scheduler --tail=100
# For managed clusters (EKS/GKE/AKS): # Check cloud provider console for scheduler logs ```
Analyze scheduling failure reasons:
```bash # Get detailed scheduling failure reason kubectl describe pod <pod-name> | grep -A5 "Events:"
# Common messages and meanings:
# "Insufficient cpu" - Not enough CPU on any node # "Insufficient memory" - Not enough memory on any node # "node(s) had taints that the pod didn't tolerate" - Missing tolerations # "node(s) didn't match PodAffinity rules" - Affinity rules not satisfied # "node(s) didn't match PodAntiAffinity rules" - Anti-affinity blocking # "NoVolumeNodeConflict" - Volume not accessible from nodes # "node(s) were unschedulable" - Nodes cordoned or not ready ```
### 2. Fix resource insufficiency
Check cluster resources:
```bash # Check node capacity and allocatable kubectl describe nodes | grep -A10 "Capacity:"
# Output example: # Capacity: # cpu: 4 # memory: 16Gi # pods: 110 # Allocatable: # cpu: 3800m # Account for kubelet/system # memory: 15Gi # pods: 110
# Check current resource usage kubectl top nodes
# Output: # NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% # node-1 2500m 65% 12Gi 80% # node-2 1800m 47% 8Gi 53% # node-3 3200m 84% 14Gi 93%
# Check pod resource requests kubectl get pods -n <namespace> -o custom-columns=\ 'NAME:.metadata.name,CPU-REQ:.spec.containers[*].resources.requests.cpu,MEM-REQ:.spec.containers[*].resources.requests.memory'
# Sum resource requests in namespace kubectl top pods -n <namespace> ```
Fix resource issues:
```yaml # Option 1: Reduce pod resource requests apiVersion: v1 kind: Pod metadata: name: my-app spec: containers: - name: app image: my-app:latest resources: requests: memory: "256Mi" # Reduced from 1Gi cpu: "100m" # Reduced from 500m limits: memory: "512Mi" cpu: "200m"
# Option 2: Add more nodes (cluster autoscaler) # For managed clusters, update node pool size # EKS: Update desired capacity # GKE: Enable cluster autoscaler # AKS: Scale node pool
# Option 3: Scale down other workloads kubectl scale deployment low-priority-app --replicas=0 -n <namespace>
# Option 4: Use pod priority and preemption apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: high-priority value: 1000000 globalDefault: false description: "High priority pods can preempt lower priority pods"
# Apply to pod spec: priorityClassName: high-priority ```
Check and fix ResourceQuota:
```bash # List quotas in namespace kubectl get resourcequota -n <namespace>
# Output: # NAME AGE HARD USED # compute 30d cpu: 10, memory: 20Gi cpu: 9500m, memory: 18Gi
# Check quota details kubectl describe resourcequota compute -n <namespace>
# Shows: # Resource Used Hard # -------- ---- ---- # cpu 9500m 10 # memory 18Gi 20Gi
# If quota is limiting: # Option 1: Increase quota kubectl edit resourcequota compute -n <namespace> # Update spec.hard values
# Option 2: Delete unused pods to free quota kubectl delete pod old-pod -n <namespace>
# Option 3: Request quota increase from admin ```
Check LimitRange:
```bash # List LimitRanges kubectl get limitrange -n <namespace>
# Check LimitRange details kubectl describe limitrange limits -n <namespace>
# Output shows: # Type Resource Min Max Default DefaultRequest # ---- -------- --- --- ------- -------------- # Pod cpu 50m 2 - - # Pod memory 64Mi 4Gi - - # Container cpu 25m 1 100m 50m # Container memory 32Mi 2Gi 128Mi 64Mi
# If pod requests below minimum, update pod: spec: containers: - resources: requests: cpu: "50m" # At least Min memory: "64Mi" ```
### 3. Fix taint and toleration issues
Check node taints:
```bash # List all node taints kubectl get nodes -o custom-columns=\ 'NAME:.metadata.name,TAINTS:.spec.taints'
# Or detailed view kubectl describe node <node-name> | grep -A5 "Taints:"
# Common taints: # node.kubernetes.io/not-ready:NoSchedule - Node not ready # node.kubernetes.io/unreachable:NoSchedule - Node unreachable # node.cloudprovider.kubernetes.io/uninitialized:NoSchedule - Cloud init pending # Dedicated taints (custom): # dedicated=database:NoSchedule - Database workloads only # special-gpu=true:NoSchedule - GPU workloads only
# Remove taint from node kubectl taint nodes <node-name> <taint-key>-
# Example: Remove dedicated taint kubectl taint nodes node-1 dedicated-
# Add toleration to pod ```
Add tolerations to pod:
```yaml # Pod with tolerations apiVersion: v1 kind: Pod metadata: name: my-app spec: tolerations: - key: "dedicated" operator: "Equal" value: "database" effect: "NoSchedule"
# Toleration for any taint with key (wildcard value) - key: "special-gpu" operator: "Exists" effect: "NoSchedule"
# Toleration for all taints (bare toleration) - operator: "Exists" effect: "NoExecute" # Also tolerate NotReady/Unreachable
containers: - name: app image: my-app:latest
# Toleration effects: # NoSchedule - Pods without toleration won't be scheduled # PreferNoSchedule - Scheduler tries to avoid, but may schedule # NoExecute - Existing pods without toleration are evicted ```
### 4. Fix node selector and affinity issues
Check node labels:
```bash # List all node labels kubectl get nodes --show-labels
# Or detailed kubectl get nodes -o custom-columns=\ 'NAME:.metadata.name,LABELS:.metadata.labels'
# Check for specific label kubectl get nodes -l disktype=ssd
# Add label to node kubectl label nodes <node-name> disktype=ssd
# Remove label kubectl label nodes <node-name> disktype- ```
Fix node selector:
```yaml # Pod with nodeSelector apiVersion: v1 kind: Pod metadata: name: my-app spec: nodeSelector: disktype: ssd # Pod only schedules on nodes with this label zone: us-east-1a
containers: - name: app image: my-app:latest
# If no nodes have required labels, either: # 1. Add labels to nodes # 2. Remove nodeSelector from pod # 3. Use more flexible affinity rules ```
Fix node affinity:
```yaml # Pod with node affinity (more flexible than nodeSelector) apiVersion: v1 kind: Pod metadata: name: my-app spec: affinity: nodeAffinity: # Required - must match requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: zone operator: In values: - us-east-1a - us-east-1b - key: disktype operator: In values: - ssd
# Preferred - scheduler tries to match preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: instance-type operator: In values: - m5.large - m5.xlarge
containers: - name: app image: my-app:latest
# Operators: In, NotIn, Exists, DoesNotExist, Gt, Lt ```
Fix pod affinity/anti-affinity:
```yaml # Pod affinity - schedule near other pods apiVersion: v1 kind: Pod metadata: name: my-app spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - database topologyKey: kubernetes.io/hostname # Same node # Or: topology.kubernetes.io/zone for same zone
# Pod anti-affinity - avoid co-location podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - my-app topologyKey: kubernetes.io/hostname # Not on same node
# Preferred anti-affinity (soft) preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - my-app topologyKey: topology.kubernetes.io/zone
containers: - name: app image: my-app:latest ```
### 5. Fix PVC binding issues
Check PVC status:
```bash # List PVCs kubectl get pvc -n <namespace>
# Output: # NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE # data-pvc Pending fast 5m
# Describe PVC kubectl describe pvc data-pvc -n <namespace>
# Check events: # Events: # Type Reason Message # ---- ------ ------- # Warning ProvisioningFailed storageclass "fast" not found # Or: waiting for first consumer to be created before binding # Or: no persistent volumes available for this claim
# List available storage classes kubectl get storageclass
# List available PVs kubectl get pv ```
Fix storage class issues:
```yaml # Option 1: Create missing storage class apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast provisioner: kubernetes.io/aws-ebs # Or gce-pd, azure-disk, etc. parameters: type: gp3 fsType: ext4 allowVolumeExpansion: true reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer # Wait for pod before binding
# Option 2: Use existing storage class apiVersion: v1 kind: PersistentVolumeClaim metadata: name: data-pvc spec: accessModes: - ReadWriteOnce storageClassName: standard # Use existing class resources: requests: storage: 10Gi
# Option 3: Create PV manually (static provisioning) apiVersion: v1 kind: PersistentVolume metadata: name: manual-pv spec: capacity: storage: 10Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain storageClassName: manual hostPath: path: /mnt/data # For local testing # Or for cloud: # awsElasticBlockStore: # volumeID: vol-xxxx # fsType: ext4 ```
Fix volume node affinity:
```bash # For local volumes or zonal storage, PV may have node affinity kubectl describe pv <pv-name> | grep -A10 "Node Affinity"
# If pod can't schedule on nodes where PV is accessible: # Option 1: Use regional storage instead of zonal # Option 2: Schedule pod in same zone as PV # Option 3: Use ReadWriteMany storage for multi-node access ```
### 6. Fix image pull issues
Check image pull errors:
```bash # Describe pod for image errors kubectl describe pod <pod-name> -n <namespace>
# Common errors: # ErrImagePull - Generic pull failure # ImagePullBackOff - Pull failed, backing off # ErrImageNeverPull - Image not present, pull policy forbids pulling
# Check container status kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.containerStatuses[*]}'
# Check events kubectl get events -n <namespace> --field-selector reason=Failed ```
Fix image pull issues:
```yaml # Common fixes:
# 1. Verify image exists and tag is correct docker pull myregistry/my-image:latest
# 2. Fix imagePullPolicy spec: containers: - name: app image: myregistry/my-image:latest imagePullPolicy: IfNotPresent # Or Always, Never
# 3. Add image pull secrets kubectl create secret docker-registry regcred \ --docker-server=myregistry.io \ --docker-username=user \ --docker-password=password \ --docker-email=user@example.com \ -n <namespace>
# Apply to pod spec: imagePullSecrets: - name: regcred containers: - name: app image: myregistry.io/my-image:latest
# 4. For private registries with service accounts kubectl patch serviceaccount default \ -n <namespace> \ -p '{"imagePullSecrets": [{"name": "regcred"}]}'
# 5. Check imagePullSecrets in namespace kubectl get serviceaccount default -n <namespace> -o yaml ```
### 7. Fix network plugin (CNI) issues
Check CNI status:
```bash # Check if CNI is installed kubectl get pods -n kube-system | grep -E "calico|flannel|weave|cilium|canal"
# If no CNI pods running, pods won't be scheduled
# Check CNI pod status kubectl describe pod -n kube-system -l k8s-app=calico-node
# Check CNI logs kubectl logs -n kube-system -l k8s-app=calico-node --tail=50
# For managed clusters, CNI should be automatic # If issues, check cloud provider documentation ```
Fix CNI issues:
```bash # Install Calico kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
# Install Flannel kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# Install Cilium kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/master/install/kubernetes/quick-install.yaml
# After CNI installation, pending pods should schedule kubectl get pods -A | grep Pending ```
### 8. Monitor and alert on scheduling issues
Prometheus metrics:
```yaml # Key metrics for scheduling monitoring
# Pending pod count kube_pod_status_phase{phase="Pending"}
# Pod scheduling errors kube_pod_container_status_waiting_reason{reason="ContainerCreating"}
# Node resource pressure kube_node_status_condition{condition="MemoryPressure",status="true"} kube_node_status_condition{condition="DiskPressure",status="true"}
# Node capacity utilization kube_node_status_capacity_cpu_cores kube_node_status_capacity_memory_bytes kube_node_status_allocatable_cpu_cores kube_node_status_allocatable_memory_bytes
# Resource quota usage kube_resourcequota{type="used"} kube_resourcequota{type="hard"} ```
Grafana alert rules:
```yaml groups: - name: kubernetes_scheduling rules: - alert: KubernetesPendingPods expr: kube_pod_status_phase{phase="Pending"} > 0 for: 10m labels: severity: warning annotations: summary: "Pods pending for more than 10 minutes" description: "{{ $value }} pods are in Pending state"
- alert: KubernetesNodeMemoryPressure
- expr: kube_node_status_condition{condition="MemoryPressure",status="true"} > 0
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "Node {{ $labels.node }} under memory pressure"
- alert: KubernetesResourceQuotaNearlyExhausted
- expr: |
- (kube_resourcequota{type="used"} / kube_resourcequota{type="hard"})
- > 0.9
- for: 1h
- labels:
- severity: warning
- annotations:
- summary: "Resource quota nearly exhausted"
- description: "{{ $labels.resource }} at {{ $value | humanizePercentage }}"
- alert: KubernetesNoSchedulableNodes
- expr: count(kube_node_spec_unschedulable == 1) == count(kube_node_info)
- for: 5m
- labels:
- severity: critical
- annotations:
- summary: "No schedulable nodes available"
`
Prevention
- Set appropriate resource requests based on actual usage (not limits)
- Use LimitRange to enforce minimum/maximum resource requests
- Monitor resource quotas and increase before exhaustion
- Use pod priority classes for critical workloads
- Implement cluster autoscaling for dynamic capacity
- Test affinity/anti-affinity rules in staging
- Document taints and required tolerations
- Use topology spread constraints for even distribution
- Set up alerts for pending pods and resource pressure
- Regular capacity planning reviews
Related Errors
- **CrashLoopBackOff**: Container starts but crashes repeatedly
- **OOMKilled**: Container exceeded memory limit
- **Evicted**: Pod removed due to node pressure
- **NodeNotReady**: Node unavailable for scheduling
- **ContainerCreating**: Container stuck in creation phase