# Fix AWS EKS Cluster Unreachable
You try to run kubectl get nodes and get "Unable to connect to the server" or "connection refused." Or maybe the nodes exist but show "NotReady" status. EKS cluster connectivity issues can stem from several places: API server access, IAM authentication, VPC networking, or node IAM roles.
This guide walks through diagnosing and fixing each potential cause.
Diagnosis Commands
First, verify the cluster exists and is active:
aws eks describe-cluster \
--name my-cluster \
--query 'cluster.[status,endpoint,certificateAuthority.data,version]' \
--output tableCheck your kubectl configuration:
kubectl config current-context
kubectl cluster-infoIf cluster-info fails, update your kubeconfig:
aws eks update-kubeconfig \
--name my-cluster \
--region us-east-1Verify the cluster endpoint is accessible:
curl -v https://$(aws eks describe-cluster --name my-cluster --query 'cluster.endpoint' --output text)/healthzCheck if you can authenticate with AWS:
aws sts get-caller-identityVerify your IAM user/role has EKS access:
aws eks list-clustersCheck the cluster's IAM authentication mode:
aws eks describe-cluster \
--name my-cluster \
--query 'cluster.accessConfig.authenticationMode'List the cluster's access entries:
aws eks list-access-entries \
--name my-clusterIf using the AWS IAM Authenticator (older method), check the aws-auth ConfigMap:
kubectl get configmap aws-auth -n kube-system -o yamlCheck node status:
kubectl get nodes -o wideDescribe a problematic node:
kubectl describe node ip-10-0-1-100.us-east-1.compute.internalCheck node logs:
```bash # SSH into the node (if using managed node groups with SSH enabled) ssh -i my-key.pem ec2-user@node-ip
# Check kubelet logs sudo journalctl -u kubelet -f ```
Common Causes and Solutions
IAM Authentication Issues
The most common cause is IAM permissions. Your IAM user/role needs permission to interact with the cluster.
For EKS Access Entry (newer method):
Add your IAM user as a cluster admin:
```bash aws eks create-access-entry \ --name my-cluster \ --principal-arn arn:aws:iam::123456789012:user/my-user \ --type STANDARD
aws eks associate-access-policy \ --name my-cluster \ --principal-arn arn:aws:iam::123456789012:user/my-user \ --policy-arn arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy \ --access-scope type=cluster ```
For ConfigMap-based auth (older method):
Edit the aws-auth ConfigMap:
kubectl edit configmap aws-auth -n kube-systemAdd your IAM user:
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapUsers: |
- userarn: arn:aws:iam::123456789012:user/my-user
username: my-user
groups:
- system:mastersOr for a role:
mapRoles: |
- rolearn: arn:aws:iam::123456789012:role/my-role
username: my-role
groups:
- system:mastersApply via CLI if you can't use kubectl:
```bash aws eks update-addon \ --name my-cluster \ --addon-name vpc-cni \ --configuration-values '{"env":{"AWS_VPC_K8S_CNI_CONFIGURE_RPREFIX":"false"}}'
# Or use the eksctl tool eksctl create iamidentitymapping \ --cluster my-cluster \ --arn arn:aws:iam::123456789012:user/my-user \ --group system:masters \ --username my-user ```
Cluster Endpoint Not Accessible
If your cluster has private endpoint only, you must access from within the VPC:
Check the endpoint configuration:
aws eks describe-cluster \
--name my-cluster \
--query 'cluster.resourcesVpcConfig.[endpointPublicAccess,endpointPrivateAccess,publicAccessCidrs]'Enable public access:
aws eks update-cluster-config \
--name my-cluster \
--resources-vpc-config endpointPublicAccess=true,publicAccessCidrs=["0.0.0.0/0"]Or for private access only, ensure you're connected to the VPC:
# Option 1: Use a VPN connection to your VPC
# Option 2: Use a bastion host in the VPC
# Option 3: Use AWS CloudShell or an EC2 instance in the VPCCreate a bastion host for access:
aws ec2 run-instances \
--image-id ami-0abcdef1234567890 \
--instance-type t3.micro \
--key-name my-key \
--subnet-id subnet-12345 \
--security-group-ids sg-12345 \
--user-data '#!/bin/bash
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
mv kubectl /usr/local/bin/
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
./aws/install'Node IAM Role Issues
Nodes need specific IAM permissions to join the cluster. Check the node IAM role:
aws ec2 describe-instances \
--filters "Name=tag:kubernetes.io/cluster/my-cluster,Values=owned" \
--query 'Reservations[*].Instances[*].[InstanceId,IamInstanceProfile.Arn]' \
--output tableGet the role name and check its policies:
```bash aws iam list-attached-role-policies \ --role-name my-node-role
aws iam list-instance-profiles-for-role \ --role-name my-node-role ```
The node role needs these policies:
- AmazonEKSWorkerNodePolicy
- AmazonEC2ContainerRegistryReadOnly
- AmazonEKS_CNI_Policy
Attach missing policies:
```bash aws iam attach-role-policy \ --role-name my-node-role \ --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
aws iam attach-role-policy \ --role-name my-node-role \ --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
aws iam attach-role-policy \ --role-name my-node-role \ --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly ```
The node role must also be in the aws-auth ConfigMap:
eksctl create iamidentitymapping \
--cluster my-cluster \
--arn arn:aws:iam::123456789012:role/my-node-role \
--group system:bootstrappers \
--username system:node:{{EC2PrivateDNSName}}Security Group Issues
The cluster and nodes need proper security group rules.
Check the cluster security group:
aws eks describe-cluster \
--name my-cluster \
--query 'cluster.resourcesVpcConfig.clusterSecurityGroupId'Check node security groups:
aws ec2 describe-instances \
--filters "Name=tag:kubernetes.io/cluster/my-cluster,Values=owned" \
--query 'Reservations[*].Instances[*].SecurityGroups[*].GroupId'Ensure the control plane can communicate with nodes on these ports: - 1025-65535 (TCP) - Node ports - 443 (TCP) - API server - 10250 (TCP) - Kubelet
Add necessary rules:
```bash # Get cluster security group CLUSTER_SG=$(aws eks describe-cluster --name my-cluster --query 'cluster.resourcesVpcConfig.clusterSecurityGroupId' --output text)
# Allow control plane to communicate with nodes aws ec2 authorize-security-group-ingress \ --group-id $CLUSTER_SG \ --protocol tcp \ --port 1025-65535 \ --source-group $CLUSTER_SG
# Allow kubelet communication aws ec2 authorize-security-group-ingress \ --group-id $CLUSTER_SG \ --protocol tcp \ --port 10250 \ --source-group $CLUSTER_SG ```
Subnet Issues
Nodes must be in subnets that allow outbound internet access for pulling images.
Check if subnets have route to internet:
```bash # Get node subnets aws ec2 describe-instances \ --filters "Name=tag:kubernetes.io/cluster/my-cluster,Values=owned" \ --query 'Reservations[*].Instances[*].SubnetId' \ --output text | tr '\t' '\n' | sort -u
# Check route tables aws ec2 describe-route-tables \ --filters "Name=association.subnet-id,Values=subnet-12345" \ --query 'RouteTables[*].Routes[*].[DestinationCidrBlock,GatewayId]' ```
Look for routes to an internet gateway (igw-xxxxx) for 0.0.0.0/0.
If missing, add a route:
aws ec2 create-route \
--route-table-id rtb-12345 \
--destination-cidr-block 0.0.0.0/0 \
--gateway-id igw-12345For private subnets, ensure there's a NAT gateway:
aws ec2 describe-nat-gateways \
--filter "Name=vpc-id,Values=vpc-12345" \
--query 'NatGateways[*].[NatGatewayId,State,SubnetId]'Node Not Ready
If nodes exist but show NotReady status:
Check node conditions:
kubectl describe node <node-name> | grep -A 10 ConditionsCommon issues:
CNI not running:
```bash kubectl get pods -n kube-system -l k8s-app=aws-node
kubectl logs -n kube-system -l k8s-app=aws-node ```
If CNI pods are failing, check IAM permissions:
aws iam attach-role-policy \
--role-name my-node-role \
--policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_PolicyKube-proxy issues:
```bash kubectl get pods -n kube-system -l k8s-app=kube-proxy
kubectl logs -n kube-system -l k8s-app=kube-proxy ```
Disk pressure:
kubectl describe node <node-name> | grep -A 5 "DiskPressure"Clean up disk space on the node:
```bash # SSH into the node ssh ec2-user@<node-ip>
# Check disk usage df -h
# Clean up Docker resources sudo docker system prune -af ```
Cluster Certificate Issues
If kubectl shows certificate errors:
```bash # Check if certificate in kubeconfig matches cluster aws eks describe-cluster \ --name my-cluster \ --query 'cluster.certificateAuthority.data' \ --output text | base64 -d | openssl x509 -text -noout
# Regenerate kubeconfig aws eks update-kubeconfig --name my-cluster --force ```
Cluster Upgrade Issues
After upgrading, nodes might not match the control plane version:
```bash # Check versions kubectl version -o yaml
# Upgrade node groups aws eks update-nodegroup-version \ --cluster-name my-cluster \ --nodegroup-name my-nodegroup \ --kubernetes-version 1.28 ```
Verification Steps
After making changes, verify cluster connectivity:
```bash # Check cluster access kubectl get nodes kubectl get pods -A
# Verify API server health curl -k https://$(aws eks describe-cluster --name my-cluster --query 'cluster.endpoint' --output text)/healthz
# Check component status kubectl get componentstatuses
# Verify all system pods are running kubectl get pods -n kube-system ```
Set up monitoring for future issues:
```bash # Create a CloudWatch alarm for API server errors aws cloudwatch put-metric-alarm \ --alarm-name eks-api-server-errors \ --namespace AWS/EKS \ --metric-name APIServerErrors \ --dimensions Name=ClusterName,Value=my-cluster \ --statistic Sum \ --period 60 \ --threshold 100 \ --comparison-operator GreaterThanThreshold \ --evaluation-periods 3
# Enable control plane logging aws eks update-cluster-config \ --name my-cluster \ --logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}' ```
Create a connectivity test script:
```bash #!/bin/bash echo "Testing EKS cluster connectivity..."
echo "1. Checking AWS authentication..." aws sts get-caller-identity || exit 1
echo "2. Checking cluster exists..." aws eks describe-cluster --name my-cluster --query 'cluster.status' || exit 1
echo "3. Updating kubeconfig..." aws eks update-kubeconfig --name my-cluster || exit 1
echo "4. Testing kubectl connection..." kubectl get nodes || exit 1
echo "5. Checking system pods..." kubectl get pods -n kube-system || exit 1
echo "All connectivity tests passed!" ```