# Fix AWS EKS Cluster Unreachable

You try to run kubectl get nodes and get "Unable to connect to the server" or "connection refused." Or maybe the nodes exist but show "NotReady" status. EKS cluster connectivity issues can stem from several places: API server access, IAM authentication, VPC networking, or node IAM roles.

This guide walks through diagnosing and fixing each potential cause.

Diagnosis Commands

First, verify the cluster exists and is active:

bash
aws eks describe-cluster \
  --name my-cluster \
  --query 'cluster.[status,endpoint,certificateAuthority.data,version]' \
  --output table

Check your kubectl configuration:

bash
kubectl config current-context
kubectl cluster-info

If cluster-info fails, update your kubeconfig:

bash
aws eks update-kubeconfig \
  --name my-cluster \
  --region us-east-1

Verify the cluster endpoint is accessible:

bash
curl -v https://$(aws eks describe-cluster --name my-cluster --query 'cluster.endpoint' --output text)/healthz

Check if you can authenticate with AWS:

bash
aws sts get-caller-identity

Verify your IAM user/role has EKS access:

bash
aws eks list-clusters

Check the cluster's IAM authentication mode:

bash
aws eks describe-cluster \
  --name my-cluster \
  --query 'cluster.accessConfig.authenticationMode'

List the cluster's access entries:

bash
aws eks list-access-entries \
  --name my-cluster

If using the AWS IAM Authenticator (older method), check the aws-auth ConfigMap:

bash
kubectl get configmap aws-auth -n kube-system -o yaml

Check node status:

bash
kubectl get nodes -o wide

Describe a problematic node:

bash
kubectl describe node ip-10-0-1-100.us-east-1.compute.internal

Check node logs:

```bash # SSH into the node (if using managed node groups with SSH enabled) ssh -i my-key.pem ec2-user@node-ip

# Check kubelet logs sudo journalctl -u kubelet -f ```

Common Causes and Solutions

IAM Authentication Issues

The most common cause is IAM permissions. Your IAM user/role needs permission to interact with the cluster.

For EKS Access Entry (newer method):

Add your IAM user as a cluster admin:

```bash aws eks create-access-entry \ --name my-cluster \ --principal-arn arn:aws:iam::123456789012:user/my-user \ --type STANDARD

aws eks associate-access-policy \ --name my-cluster \ --principal-arn arn:aws:iam::123456789012:user/my-user \ --policy-arn arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy \ --access-scope type=cluster ```

For ConfigMap-based auth (older method):

Edit the aws-auth ConfigMap:

bash
kubectl edit configmap aws-auth -n kube-system

Add your IAM user:

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapUsers: |
    - userarn: arn:aws:iam::123456789012:user/my-user
      username: my-user
      groups:
        - system:masters

Or for a role:

yaml
mapRoles: |
  - rolearn: arn:aws:iam::123456789012:role/my-role
    username: my-role
    groups:
      - system:masters

Apply via CLI if you can't use kubectl:

```bash aws eks update-addon \ --name my-cluster \ --addon-name vpc-cni \ --configuration-values '{"env":{"AWS_VPC_K8S_CNI_CONFIGURE_RPREFIX":"false"}}'

# Or use the eksctl tool eksctl create iamidentitymapping \ --cluster my-cluster \ --arn arn:aws:iam::123456789012:user/my-user \ --group system:masters \ --username my-user ```

Cluster Endpoint Not Accessible

If your cluster has private endpoint only, you must access from within the VPC:

Check the endpoint configuration:

bash
aws eks describe-cluster \
  --name my-cluster \
  --query 'cluster.resourcesVpcConfig.[endpointPublicAccess,endpointPrivateAccess,publicAccessCidrs]'

Enable public access:

bash
aws eks update-cluster-config \
  --name my-cluster \
  --resources-vpc-config endpointPublicAccess=true,publicAccessCidrs=["0.0.0.0/0"]

Or for private access only, ensure you're connected to the VPC:

bash
# Option 1: Use a VPN connection to your VPC
# Option 2: Use a bastion host in the VPC
# Option 3: Use AWS CloudShell or an EC2 instance in the VPC

Create a bastion host for access:

bash
aws ec2 run-instances \
  --image-id ami-0abcdef1234567890 \
  --instance-type t3.micro \
  --key-name my-key \
  --subnet-id subnet-12345 \
  --security-group-ids sg-12345 \
  --user-data '#!/bin/bash
    curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
    chmod +x kubectl
    mv kubectl /usr/local/bin/
    curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
    unzip awscliv2.zip
    ./aws/install'

Node IAM Role Issues

Nodes need specific IAM permissions to join the cluster. Check the node IAM role:

bash
aws ec2 describe-instances \
  --filters "Name=tag:kubernetes.io/cluster/my-cluster,Values=owned" \
  --query 'Reservations[*].Instances[*].[InstanceId,IamInstanceProfile.Arn]' \
  --output table

Get the role name and check its policies:

```bash aws iam list-attached-role-policies \ --role-name my-node-role

aws iam list-instance-profiles-for-role \ --role-name my-node-role ```

The node role needs these policies: - AmazonEKSWorkerNodePolicy - AmazonEC2ContainerRegistryReadOnly - AmazonEKS_CNI_Policy

Attach missing policies:

```bash aws iam attach-role-policy \ --role-name my-node-role \ --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy

aws iam attach-role-policy \ --role-name my-node-role \ --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy

aws iam attach-role-policy \ --role-name my-node-role \ --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly ```

The node role must also be in the aws-auth ConfigMap:

bash
eksctl create iamidentitymapping \
  --cluster my-cluster \
  --arn arn:aws:iam::123456789012:role/my-node-role \
  --group system:bootstrappers \
  --username system:node:{{EC2PrivateDNSName}}

Security Group Issues

The cluster and nodes need proper security group rules.

Check the cluster security group:

bash
aws eks describe-cluster \
  --name my-cluster \
  --query 'cluster.resourcesVpcConfig.clusterSecurityGroupId'

Check node security groups:

bash
aws ec2 describe-instances \
  --filters "Name=tag:kubernetes.io/cluster/my-cluster,Values=owned" \
  --query 'Reservations[*].Instances[*].SecurityGroups[*].GroupId'

Ensure the control plane can communicate with nodes on these ports: - 1025-65535 (TCP) - Node ports - 443 (TCP) - API server - 10250 (TCP) - Kubelet

Add necessary rules:

```bash # Get cluster security group CLUSTER_SG=$(aws eks describe-cluster --name my-cluster --query 'cluster.resourcesVpcConfig.clusterSecurityGroupId' --output text)

# Allow control plane to communicate with nodes aws ec2 authorize-security-group-ingress \ --group-id $CLUSTER_SG \ --protocol tcp \ --port 1025-65535 \ --source-group $CLUSTER_SG

# Allow kubelet communication aws ec2 authorize-security-group-ingress \ --group-id $CLUSTER_SG \ --protocol tcp \ --port 10250 \ --source-group $CLUSTER_SG ```

Subnet Issues

Nodes must be in subnets that allow outbound internet access for pulling images.

Check if subnets have route to internet:

```bash # Get node subnets aws ec2 describe-instances \ --filters "Name=tag:kubernetes.io/cluster/my-cluster,Values=owned" \ --query 'Reservations[*].Instances[*].SubnetId' \ --output text | tr '\t' '\n' | sort -u

# Check route tables aws ec2 describe-route-tables \ --filters "Name=association.subnet-id,Values=subnet-12345" \ --query 'RouteTables[*].Routes[*].[DestinationCidrBlock,GatewayId]' ```

Look for routes to an internet gateway (igw-xxxxx) for 0.0.0.0/0.

If missing, add a route:

bash
aws ec2 create-route \
  --route-table-id rtb-12345 \
  --destination-cidr-block 0.0.0.0/0 \
  --gateway-id igw-12345

For private subnets, ensure there's a NAT gateway:

bash
aws ec2 describe-nat-gateways \
  --filter "Name=vpc-id,Values=vpc-12345" \
  --query 'NatGateways[*].[NatGatewayId,State,SubnetId]'

Node Not Ready

If nodes exist but show NotReady status:

Check node conditions:

bash
kubectl describe node <node-name> | grep -A 10 Conditions

Common issues:

CNI not running:

```bash kubectl get pods -n kube-system -l k8s-app=aws-node

kubectl logs -n kube-system -l k8s-app=aws-node ```

If CNI pods are failing, check IAM permissions:

bash
aws iam attach-role-policy \
  --role-name my-node-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy

Kube-proxy issues:

```bash kubectl get pods -n kube-system -l k8s-app=kube-proxy

kubectl logs -n kube-system -l k8s-app=kube-proxy ```

Disk pressure:

bash
kubectl describe node <node-name> | grep -A 5 "DiskPressure"

Clean up disk space on the node:

```bash # SSH into the node ssh ec2-user@<node-ip>

# Check disk usage df -h

# Clean up Docker resources sudo docker system prune -af ```

Cluster Certificate Issues

If kubectl shows certificate errors:

```bash # Check if certificate in kubeconfig matches cluster aws eks describe-cluster \ --name my-cluster \ --query 'cluster.certificateAuthority.data' \ --output text | base64 -d | openssl x509 -text -noout

# Regenerate kubeconfig aws eks update-kubeconfig --name my-cluster --force ```

Cluster Upgrade Issues

After upgrading, nodes might not match the control plane version:

```bash # Check versions kubectl version -o yaml

# Upgrade node groups aws eks update-nodegroup-version \ --cluster-name my-cluster \ --nodegroup-name my-nodegroup \ --kubernetes-version 1.28 ```

Verification Steps

After making changes, verify cluster connectivity:

```bash # Check cluster access kubectl get nodes kubectl get pods -A

# Verify API server health curl -k https://$(aws eks describe-cluster --name my-cluster --query 'cluster.endpoint' --output text)/healthz

# Check component status kubectl get componentstatuses

# Verify all system pods are running kubectl get pods -n kube-system ```

Set up monitoring for future issues:

```bash # Create a CloudWatch alarm for API server errors aws cloudwatch put-metric-alarm \ --alarm-name eks-api-server-errors \ --namespace AWS/EKS \ --metric-name APIServerErrors \ --dimensions Name=ClusterName,Value=my-cluster \ --statistic Sum \ --period 60 \ --threshold 100 \ --comparison-operator GreaterThanThreshold \ --evaluation-periods 3

# Enable control plane logging aws eks update-cluster-config \ --name my-cluster \ --logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}' ```

Create a connectivity test script:

```bash #!/bin/bash echo "Testing EKS cluster connectivity..."

echo "1. Checking AWS authentication..." aws sts get-caller-identity || exit 1

echo "2. Checking cluster exists..." aws eks describe-cluster --name my-cluster --query 'cluster.status' || exit 1

echo "3. Updating kubeconfig..." aws eks update-kubeconfig --name my-cluster || exit 1

echo "4. Testing kubectl connection..." kubectl get nodes || exit 1

echo "5. Checking system pods..." kubectl get pods -n kube-system || exit 1

echo "All connectivity tests passed!" ```