What's Actually Happening
Your Azure Kubernetes Service (AKS) node shows NotReady status, preventing pods from being scheduled. The node is in the cluster but marked unhealthy.
The Error You'll See
```bash $ kubectl get nodes
NAME STATUS ROLES AGE VERSION aks-agentpool-12345678-vmss000000 NotReady agent 5d v1.28.0 ```
Node conditions:
```bash $ kubectl describe node aks-agentpool-12345678-vmss000000
Conditions: Ready Unknown NodeStatusUnknown Kubelet stopped posting node status. ```
Why This Happens
- 1.Kubelet crash - Kubelet process stopped
- 2.Network partition - Node cannot reach API server
- 3.VM extension failure - Azure VM extension failed
- 4.Resource exhaustion - Node out of memory or disk
- 5.Outbound connectivity - Node cannot reach Azure APIs
Step 1: Diagnose Node Status
```bash # Check node details: kubectl describe node aks-agentpool-12345678-vmss000000
# Check node conditions: kubectl get node aks-agentpool-12345678-vmss000000 -o jsonpath='{.status.conditions}' | jq
# Check events: kubectl get events --field-selector involvedObject.name=aks-agentpool-12345678-vmss000000
# Check pods on node: kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=aks-agentpool-12345678-vmss000000 ```
Step 2: Access Node via SSH
```bash # Use AKS run command (no SSH needed): az aks nodepool run-command --resource-group myResourceGroup --cluster-name myAKSCluster --nodepool-name agentpool --command "systemctl status kubelet"
# SSH to node: ssh -i ~/.ssh/id_rsa azureuser@<node-ip>
# Check kubelet: sudo systemctl status kubelet sudo journalctl -u kubelet -n 100 ```
Step 3: Fix Kubelet Issues
```bash # Restart kubelet: sudo systemctl restart kubelet
# Check container runtime: sudo systemctl status containerd sudo systemctl restart containerd
# Check disk space: df -h
# Check API server connectivity: curl -k https://kubernetes.default/healthz ```
Step 4: Check Azure VM Extension
```bash # Check extension status: az vmss extension list --resource-group MC_myResourceGroup_myAKSCluster_eastus --vmss-name aks-agentpool-12345678-vmss
# Reimage node if needed: az vmss reimage --resource-group MC_myResourceGroup_myAKSCluster_eastus --vmss-name aks-agentpool-12345678-vmss --instance-id 0 ```
Step 5: Fix Network Connectivity
```bash # Check outbound connectivity: az aks nodepool run-command --resource-group myResourceGroup --cluster-name myAKSCluster --nodepool-name agentpool --command "curl -I https://management.azure.com"
# Check DNS: az aks nodepool run-command --resource-group myResourceGroup --cluster-name myAKSCluster --nodepool-name agentpool --command "nslookup kubernetes.default" ```
Step 6: Drain and Recreate Node
```bash # Cordon node: kubectl cordon aks-agentpool-12345678-vmss000000
# Drain node: kubectl drain aks-agentpool-12345678-vmss000000 --ignore-daemonsets --delete-emptydir-data --force
# Delete node: kubectl delete node aks-agentpool-12345678-vmss000000
# Delete VM instance: az vmss delete-instances --resource-group MC_myResourceGroup_myAKSCluster_eastus --vmss-name aks-agentpool-12345678-vmss --instance-ids 0
# Wait for auto-replacement: kubectl get nodes -w ```
Step 7: Scale Node Pool
# Scale up:
az aks nodepool scale --resource-group myResourceGroup --cluster-name myAKSCluster --name agentpool --node-count 5Step 8: Monitor Node Health
```bash # Check node status: kubectl get nodes -o wide
# Check resource usage: kubectl top nodes
# Check events: kubectl get events --field-selector involvedObject.kind=Node ```
AKS Node Troubleshooting Checklist
| Check | Command | Expected |
|---|---|---|
| Node status | kubectl get nodes | Ready |
| Kubelet | systemctl status | running |
| Disk space | df -h | < 85% |
| Network | curl API | Connected |
Verify the Fix
```bash # Check node Ready: kubectl get nodes # Output: STATUS Ready
# Verify pods running: kubectl get pods -o wide # Output: Pods scheduled ```
Related Issues
- [Fix Azure AKS Authentication Failed](/articles/fix-azure-aks-cluster-authentication-failed)
- [Fix Kubernetes Node Not Ready](/articles/fix-kubernetes-node-not-ready)