Fix Azure AKS Node Not Ready

What's Actually Happening

Your Azure Kubernetes Service (AKS) node shows NotReady status, preventing pods from being scheduled. The node is in the cluster but marked unhealthy.

The Error You'll See

```bash $ kubectl get nodes

NAME STATUS ROLES AGE VERSION aks-agentpool-12345678-vmss000000 NotReady agent 5d v1.28.0 ```

Node conditions:

```bash $ kubectl describe node aks-agentpool-12345678-vmss000000

Conditions: Ready Unknown NodeStatusUnknown Kubelet stopped posting node status. ```

Why This Happens

1.Kubelet crash - Kubelet process stopped
2.Network partition - Node cannot reach API server
3.VM extension failure - Azure VM extension failed
4.Resource exhaustion - Node out of memory or disk
5.Outbound connectivity - Node cannot reach Azure APIs

Step 1: Diagnose Node Status

```bash # Check node details: kubectl describe node aks-agentpool-12345678-vmss000000

# Check node conditions: kubectl get node aks-agentpool-12345678-vmss000000 -o jsonpath='{.status.conditions}' | jq

# Check events: kubectl get events --field-selector involvedObject.name=aks-agentpool-12345678-vmss000000

# Check pods on node: kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=aks-agentpool-12345678-vmss000000 ```

Step 2: Access Node via SSH

```bash # Use AKS run command (no SSH needed): az aks nodepool run-command --resource-group myResourceGroup --cluster-name myAKSCluster --nodepool-name agentpool --command "systemctl status kubelet"

# SSH to node: ssh -i ~/.ssh/id_rsa azureuser@<node-ip>

# Check kubelet: sudo systemctl status kubelet sudo journalctl -u kubelet -n 100 ```

Step 3: Fix Kubelet Issues

```bash # Restart kubelet: sudo systemctl restart kubelet

# Check container runtime: sudo systemctl status containerd sudo systemctl restart containerd

# Check disk space: df -h

# Check API server connectivity: curl -k https://kubernetes.default/healthz ```

Step 4: Check Azure VM Extension

```bash # Check extension status: az vmss extension list --resource-group MC_myResourceGroup_myAKSCluster_eastus --vmss-name aks-agentpool-12345678-vmss

# Reimage node if needed: az vmss reimage --resource-group MC_myResourceGroup_myAKSCluster_eastus --vmss-name aks-agentpool-12345678-vmss --instance-id 0 ```

Step 5: Fix Network Connectivity

```bash # Check outbound connectivity: az aks nodepool run-command --resource-group myResourceGroup --cluster-name myAKSCluster --nodepool-name agentpool --command "curl -I https://management.azure.com"

# Check DNS: az aks nodepool run-command --resource-group myResourceGroup --cluster-name myAKSCluster --nodepool-name agentpool --command "nslookup kubernetes.default" ```

Step 6: Drain and Recreate Node

```bash # Cordon node: kubectl cordon aks-agentpool-12345678-vmss000000

# Drain node: kubectl drain aks-agentpool-12345678-vmss000000 --ignore-daemonsets --delete-emptydir-data --force

# Delete node: kubectl delete node aks-agentpool-12345678-vmss000000

# Delete VM instance: az vmss delete-instances --resource-group MC_myResourceGroup_myAKSCluster_eastus --vmss-name aks-agentpool-12345678-vmss --instance-ids 0

# Wait for auto-replacement: kubectl get nodes -w ```

Step 7: Scale Node Pool

bash

# Scale up:
az aks nodepool scale --resource-group myResourceGroup --cluster-name myAKSCluster --name agentpool --node-count 5

Step 8: Monitor Node Health

```bash # Check node status: kubectl get nodes -o wide

# Check resource usage: kubectl top nodes

# Check events: kubectl get events --field-selector involvedObject.kind=Node ```

AKS Node Troubleshooting Checklist

Check	Command	Expected
Node status	kubectl get nodes	Ready
Kubelet	systemctl status	running
Disk space	df -h	< 85%
Network	curl API	Connected

Verify the Fix

```bash # Check node Ready: kubectl get nodes # Output: STATUS Ready

# Verify pods running: kubectl get pods -o wide # Output: Pods scheduled ```

[Fix Azure AKS Authentication Failed](/articles/fix-azure-aks-cluster-authentication-failed)
[Fix Kubernetes Node Not Ready](/articles/fix-kubernetes-node-not-ready)

What's Actually Happening

The Error You'll See

Why This Happens

Step 1: Diagnose Node Status

Step 2: Access Node via SSH

Step 3: Fix Kubelet Issues

Step 4: Check Azure VM Extension

Step 5: Fix Network Connectivity

Step 6: Drain and Recreate Node

Step 7: Scale Node Pool

Step 8: Monitor Node Health

AKS Node Troubleshooting Checklist

Verify the Fix

Related Issues

Share this guide

More Azure Troubleshooting Guides

Azure Cosmos DB Throughput Throttling

Azure SQL Managed Instance VNet Routing Issue

Fix Azure VM Not Starting

Azure SQL Managed Instance Time Zone Not Changing

Azure SQL Database TDE Key Error

Azure SQL Elastic Pool Quota Exceeded