What's Actually Happening

Cannot connect to Google Kubernetes Engine (GKE) cluster using kubectl. Commands timeout or fail with connection errors.

The Error You'll See

```bash $ kubectl get pods

The connection to the server gke-cluster-ip was refused - did you specify the right host or port? ```

Timeout error:

```bash $ kubectl get nodes

Unable to connect to the server: dial tcp x.x.x.x:443: i/o timeout ```

Authentication error:

```bash $ kubectl get pods

Unable to connect to the server: x509: certificate signed by unknown authority ```

Cluster not found:

```bash $ gcloud container clusters describe my-cluster

ERROR: (gcloud.container.clusters.describe) NOT_FOUND ```

Why This Happens

  1. 1.Cluster not running - Cluster stopped or upgrading
  2. 2.Wrong credentials - kubectl using wrong kubeconfig
  3. 3.Network blocked - Firewall or network blocking access
  4. 4.API server issues - Control plane not responding
  5. 5.Private cluster - Cannot access from outside VPC
  6. 6.Credentials expired - gcloud auth token expired

Step 1: Check Cluster Status

```bash # Check cluster status: gcloud container clusters list

# Describe cluster: gcloud container clusters describe my-cluster --region us-central1

# Check cluster status in output: # status: RUNNING

# Check from Console: # Cloud Console -> Kubernetes Engine -> Clusters

# Check if cluster exists: gcloud container clusters list --filter="name=my-cluster"

# Check cluster version and status: gcloud container clusters describe my-cluster --region us-central1 \ --format="value(status, currentMasterVersion, currentNodeCount)"

# Check cluster operations: gcloud container operations list --region us-central1

# Check if upgrade in progress: gcloud container operations list --region us-central1 --filter="status:RUNNING" ```

Step 2: Get Cluster Credentials

```bash # Get credentials for kubectl: gcloud container clusters get-credentials my-cluster --region us-central1

# With project: gcloud container clusters get-credentials my-cluster --region us-central1 --project my-project

# Check kubeconfig: kubectl config current-context

# View kubeconfig: kubectl config view

# Check contexts: kubectl config get-contexts

# Switch context: kubectl config use-context gke_my-project_us-central1_my-cluster

# Check cluster in kubeconfig: kubectl config get-clusters

# Delete and re-get credentials: kubectl config delete-cluster gke_my-project_us-central1_my-cluster gcloud container clusters get-credentials my-cluster --region us-central1 ```

Step 3: Check API Server Status

```bash # Check API server endpoint: gcloud container clusters describe my-cluster --region us-central1 \ --format="value(endpoint)"

# Test API server connection: curl -k https://$(gcloud container clusters describe my-cluster --region us-central1 --format="value(endpoint)")/healthz

# Check API server is accessible: nc -zv $(gcloud container clusters describe my-cluster --region us-central1 --format="value(endpoint)") 443

# Check master authorized networks: gcloud container clusters describe my-cluster --region us-central1 \ --format="yaml(masterAuthorizedNetworksConfig)"

# Add your IP to authorized networks: gcloud container clusters update my-cluster --region us-central1 \ --enable-master-authorized-networks \ --master-authorized-networks 1.2.3.4/32

# Check if private cluster: gcloud container clusters describe my-cluster --region us-central1 \ --format="value(privateClusterConfig.enablePrivateEndpoint)" ```

Step 4: Check Firewall Rules

```bash # List firewall rules: gcloud compute firewall-rules list

# Check GKE firewall rules: gcloud compute firewall-rules list --filter="name:gke"

# Check specific rule: gcloud compute firewall-rules describe gke-my-cluster-ssh

# Create rule for API access (if needed): gcloud compute firewall-rules create allow-gke-api \ --network my-network \ --allow tcp:443 \ --source-ranges 0.0.0.0/0

# For private cluster, check control plane access: gcloud compute firewall-rules describe gke-my-cluster-master

# Check VPC network: gcloud compute networks describe my-network

# Check if using Shared VPC: gcloud compute shared-vpc get-host-project my-service-project ```

Step 5: Check Private Cluster Access

```bash # Check if private cluster: gcloud container clusters describe my-cluster --region us-central1 \ --format="yaml(privateClusterConfig)"

# For private cluster, access requires: # 1. VPN or Cloud Interconnect # 2. Authorized VPC network # 3. Or use Cloud Shell / GCE VM in same VPC

# Connect via Cloud Shell: # Cloud Console -> Activate Cloud Shell # Cloud Shell has access to private clusters

# Create bastion VM: gcloud compute instances create bastion \ --network my-network \ --subnet my-subnet \ --zone us-central1-a

# SSH to bastion and access cluster: gcloud compute ssh bastion --zone us-central1-a gcloud container clusters get-credentials my-cluster --region us-central1

# Check private endpoint: gcloud container clusters describe my-cluster --region us-central1 \ --format="value(privateClusterConfig.privateEndpoint)"

# Enable public endpoint for private cluster: gcloud container clusters update my-cluster --region us-central1 \ --enable-legacy-authorization ```

Step 6: Check gcloud Authentication

```bash # Check gcloud auth status: gcloud auth list

# Login: gcloud auth login

# Login with service account: gcloud auth activate-service-account --key-file key.json

# Check current project: gcloud config get-value project

# Set project: gcloud config set project my-project

# Check access token: gcloud auth print-access-token

# Refresh credentials: gcloud auth application-default login

# Check IAM permissions: gcloud projects get-iam-policy my-project \ --flatten="bindings[].members" \ --filter="bindings.members:user:myemail@example.com"

# Required roles: # - container.clusters.get # - container.pods.list # - container.nodes.list ```

Step 7: Check Node Status

```bash # Check node pools: gcloud container node-pools list --cluster my-cluster --region us-central1

# Check node pool details: gcloud container node-pools describe default-pool \ --cluster my-cluster --region us-central1

# Check if nodes exist: kubectl get nodes # Or via gcloud: gcloud container clusters describe my-cluster --region us-central1 \ --format="value(currentNodeCount)"

# Check node status: kubectl describe nodes

# Check node logs: kubectl logs -n kube-system <node-pod>

# Check node instance groups: gcloud compute instance-groups list

# Check instance group: gcloud compute instance-groups describe gke-my-cluster-default-pool --region us-central1 ```

Step 8: Check Cloud Logging

```bash # Check cluster logs: gcloud logging read "resource.type=gke_cluster" --limit 20

# Check master logs: gcloud logging read "resource.type=gke_master" --limit 20

# Check API server logs: gcloud logging read "resource.type=k8s_cluster AND labels.kubernetes.io/legacy-module=apiserver" --limit 20

# Check for errors: gcloud logging read "resource.type=gke_cluster AND severity>=ERROR" --limit 20

# Stream logs: gcloud logging tail "resource.type=gke_cluster"

# Check Cloud Audit Logs: gcloud logging read "protoPayload.serviceName=container.googleapis.com" --limit 20 ```

Step 9: Repair Cluster Issues

```bash # Repair cluster (recover from failed state): gcloud container clusters update my-cluster --region us-central1 \ --enable-autorepair

# Check repair status: gcloud container operations list --region us-central1 --filter="type:REPAIR"

# Force upgrade to fix master: gcloud container clusters upgrade my-cluster --region us-central1 \ --master --cluster-version 1.27

# Recreate node pool: gcloud container node-pools delete default-pool \ --cluster my-cluster --region us-central1

gcloud container node-pools create default-pool \ --cluster my-cluster --region us-central1 \ --num-nodes 3

# Check cluster events: kubectl get events --all-namespaces ```

Step 10: GKE Cluster Verification Script

```bash # Create verification script: cat << 'EOF' > /usr/local/bin/check-gke-cluster.sh #!/bin/bash

CLUSTER=${1:-"my-cluster"} REGION=${2:-"us-central1"}

echo "=== Cluster Status ===" gcloud container clusters describe $CLUSTER --region $REGION \ --format="table(name, status, currentMasterVersion, currentNodeCount)"

echo "" echo "=== API Server Endpoint ===" ENDPOINT=$(gcloud container clusters describe $CLUSTER --region $REGION --format="value(endpoint)") echo "Endpoint: $ENDPOINT" nc -zv -w 5 $ENDPOINT 443 2>&1 || echo "Cannot reach API server"

echo "" echo "=== Current Context ===" kubectl config current-context

echo "" echo "=== Node Status ===" kubectl get nodes -o wide 2>/dev/null || echo "Cannot get nodes"

echo "" echo "=== System Pods ===" kubectl get pods -n kube-system 2>/dev/null || echo "Cannot get pods"

echo "" echo "=== Master Authorized Networks ===" gcloud container clusters describe $CLUSTER --region $REGION \ --format="yaml(masterAuthorizedNetworksConfig)" 2>/dev/null

echo "" echo "=== Recent Errors ===" gcloud logging read "resource.type=gke_cluster AND severity>=ERROR" --limit 5 --format="table(timestamp, textPayload)" 2>/dev/null || echo "No logs or permission denied"

echo "" echo "=== Firewall Rules ===" gcloud compute firewall-rules list --filter="name:gke-$CLUSTER" --format="table(name, allowed[], sourceRanges)" 2>/dev/null || echo "No firewall rules found" EOF

chmod +x /usr/local/bin/check-gke-cluster.sh

# Usage: /usr/local/bin/check-gke-cluster.sh my-cluster us-central1 ```

GKE Cluster Checklist

CheckCommandExpected
Cluster statusgcloud clusters describeRUNNING
Credentialskubectl config current-contextCorrect cluster
API servernc endpoint 443Connection OK
Authgcloud auth listAccount active
Nodeskubectl get nodesNodes Ready
Firewallgcloud firewall-rulesPort 443 allowed

Verify the Fix

```bash # After fixing cluster access

# 1. Check context kubectl config current-context // gke_project_region_cluster

# 2. Get nodes kubectl get nodes // All nodes Ready

# 3. Get pods kubectl get pods -A // Pods running

# 4. Test API curl -k https://endpoint/healthz // ok

# 5. Check auth gcloud auth list // Correct account

# 6. Monitor status kubectl cluster-info // Cluster info displayed ```

  • [Fix Kubernetes Node Not Ready](/articles/fix-kubernetes-node-not-ready)
  • [Fix Kubernetes Pod CrashLoopBackOff](/articles/fix-kubernetes-pod-crashloopbackoff)
  • [Fix Kubernetes Service Not Found](/articles/fix-kubernetes-service-not-found)