What's Actually Happening
Cannot connect to Google Kubernetes Engine (GKE) cluster using kubectl. Commands timeout or fail with connection errors.
The Error You'll See
```bash $ kubectl get pods
The connection to the server gke-cluster-ip was refused - did you specify the right host or port? ```
Timeout error:
```bash $ kubectl get nodes
Unable to connect to the server: dial tcp x.x.x.x:443: i/o timeout ```
Authentication error:
```bash $ kubectl get pods
Unable to connect to the server: x509: certificate signed by unknown authority ```
Cluster not found:
```bash $ gcloud container clusters describe my-cluster
ERROR: (gcloud.container.clusters.describe) NOT_FOUND ```
Why This Happens
- 1.Cluster not running - Cluster stopped or upgrading
- 2.Wrong credentials - kubectl using wrong kubeconfig
- 3.Network blocked - Firewall or network blocking access
- 4.API server issues - Control plane not responding
- 5.Private cluster - Cannot access from outside VPC
- 6.Credentials expired - gcloud auth token expired
Step 1: Check Cluster Status
```bash # Check cluster status: gcloud container clusters list
# Describe cluster: gcloud container clusters describe my-cluster --region us-central1
# Check cluster status in output: # status: RUNNING
# Check from Console: # Cloud Console -> Kubernetes Engine -> Clusters
# Check if cluster exists: gcloud container clusters list --filter="name=my-cluster"
# Check cluster version and status: gcloud container clusters describe my-cluster --region us-central1 \ --format="value(status, currentMasterVersion, currentNodeCount)"
# Check cluster operations: gcloud container operations list --region us-central1
# Check if upgrade in progress: gcloud container operations list --region us-central1 --filter="status:RUNNING" ```
Step 2: Get Cluster Credentials
```bash # Get credentials for kubectl: gcloud container clusters get-credentials my-cluster --region us-central1
# With project: gcloud container clusters get-credentials my-cluster --region us-central1 --project my-project
# Check kubeconfig: kubectl config current-context
# View kubeconfig: kubectl config view
# Check contexts: kubectl config get-contexts
# Switch context: kubectl config use-context gke_my-project_us-central1_my-cluster
# Check cluster in kubeconfig: kubectl config get-clusters
# Delete and re-get credentials: kubectl config delete-cluster gke_my-project_us-central1_my-cluster gcloud container clusters get-credentials my-cluster --region us-central1 ```
Step 3: Check API Server Status
```bash # Check API server endpoint: gcloud container clusters describe my-cluster --region us-central1 \ --format="value(endpoint)"
# Test API server connection: curl -k https://$(gcloud container clusters describe my-cluster --region us-central1 --format="value(endpoint)")/healthz
# Check API server is accessible: nc -zv $(gcloud container clusters describe my-cluster --region us-central1 --format="value(endpoint)") 443
# Check master authorized networks: gcloud container clusters describe my-cluster --region us-central1 \ --format="yaml(masterAuthorizedNetworksConfig)"
# Add your IP to authorized networks: gcloud container clusters update my-cluster --region us-central1 \ --enable-master-authorized-networks \ --master-authorized-networks 1.2.3.4/32
# Check if private cluster: gcloud container clusters describe my-cluster --region us-central1 \ --format="value(privateClusterConfig.enablePrivateEndpoint)" ```
Step 4: Check Firewall Rules
```bash # List firewall rules: gcloud compute firewall-rules list
# Check GKE firewall rules: gcloud compute firewall-rules list --filter="name:gke"
# Check specific rule: gcloud compute firewall-rules describe gke-my-cluster-ssh
# Create rule for API access (if needed): gcloud compute firewall-rules create allow-gke-api \ --network my-network \ --allow tcp:443 \ --source-ranges 0.0.0.0/0
# For private cluster, check control plane access: gcloud compute firewall-rules describe gke-my-cluster-master
# Check VPC network: gcloud compute networks describe my-network
# Check if using Shared VPC: gcloud compute shared-vpc get-host-project my-service-project ```
Step 5: Check Private Cluster Access
```bash # Check if private cluster: gcloud container clusters describe my-cluster --region us-central1 \ --format="yaml(privateClusterConfig)"
# For private cluster, access requires: # 1. VPN or Cloud Interconnect # 2. Authorized VPC network # 3. Or use Cloud Shell / GCE VM in same VPC
# Connect via Cloud Shell: # Cloud Console -> Activate Cloud Shell # Cloud Shell has access to private clusters
# Create bastion VM: gcloud compute instances create bastion \ --network my-network \ --subnet my-subnet \ --zone us-central1-a
# SSH to bastion and access cluster: gcloud compute ssh bastion --zone us-central1-a gcloud container clusters get-credentials my-cluster --region us-central1
# Check private endpoint: gcloud container clusters describe my-cluster --region us-central1 \ --format="value(privateClusterConfig.privateEndpoint)"
# Enable public endpoint for private cluster: gcloud container clusters update my-cluster --region us-central1 \ --enable-legacy-authorization ```
Step 6: Check gcloud Authentication
```bash # Check gcloud auth status: gcloud auth list
# Login: gcloud auth login
# Login with service account: gcloud auth activate-service-account --key-file key.json
# Check current project: gcloud config get-value project
# Set project: gcloud config set project my-project
# Check access token: gcloud auth print-access-token
# Refresh credentials: gcloud auth application-default login
# Check IAM permissions: gcloud projects get-iam-policy my-project \ --flatten="bindings[].members" \ --filter="bindings.members:user:myemail@example.com"
# Required roles: # - container.clusters.get # - container.pods.list # - container.nodes.list ```
Step 7: Check Node Status
```bash # Check node pools: gcloud container node-pools list --cluster my-cluster --region us-central1
# Check node pool details: gcloud container node-pools describe default-pool \ --cluster my-cluster --region us-central1
# Check if nodes exist: kubectl get nodes # Or via gcloud: gcloud container clusters describe my-cluster --region us-central1 \ --format="value(currentNodeCount)"
# Check node status: kubectl describe nodes
# Check node logs: kubectl logs -n kube-system <node-pod>
# Check node instance groups: gcloud compute instance-groups list
# Check instance group: gcloud compute instance-groups describe gke-my-cluster-default-pool --region us-central1 ```
Step 8: Check Cloud Logging
```bash # Check cluster logs: gcloud logging read "resource.type=gke_cluster" --limit 20
# Check master logs: gcloud logging read "resource.type=gke_master" --limit 20
# Check API server logs: gcloud logging read "resource.type=k8s_cluster AND labels.kubernetes.io/legacy-module=apiserver" --limit 20
# Check for errors: gcloud logging read "resource.type=gke_cluster AND severity>=ERROR" --limit 20
# Stream logs: gcloud logging tail "resource.type=gke_cluster"
# Check Cloud Audit Logs: gcloud logging read "protoPayload.serviceName=container.googleapis.com" --limit 20 ```
Step 9: Repair Cluster Issues
```bash # Repair cluster (recover from failed state): gcloud container clusters update my-cluster --region us-central1 \ --enable-autorepair
# Check repair status: gcloud container operations list --region us-central1 --filter="type:REPAIR"
# Force upgrade to fix master: gcloud container clusters upgrade my-cluster --region us-central1 \ --master --cluster-version 1.27
# Recreate node pool: gcloud container node-pools delete default-pool \ --cluster my-cluster --region us-central1
gcloud container node-pools create default-pool \ --cluster my-cluster --region us-central1 \ --num-nodes 3
# Check cluster events: kubectl get events --all-namespaces ```
Step 10: GKE Cluster Verification Script
```bash # Create verification script: cat << 'EOF' > /usr/local/bin/check-gke-cluster.sh #!/bin/bash
CLUSTER=${1:-"my-cluster"} REGION=${2:-"us-central1"}
echo "=== Cluster Status ===" gcloud container clusters describe $CLUSTER --region $REGION \ --format="table(name, status, currentMasterVersion, currentNodeCount)"
echo "" echo "=== API Server Endpoint ===" ENDPOINT=$(gcloud container clusters describe $CLUSTER --region $REGION --format="value(endpoint)") echo "Endpoint: $ENDPOINT" nc -zv -w 5 $ENDPOINT 443 2>&1 || echo "Cannot reach API server"
echo "" echo "=== Current Context ===" kubectl config current-context
echo "" echo "=== Node Status ===" kubectl get nodes -o wide 2>/dev/null || echo "Cannot get nodes"
echo "" echo "=== System Pods ===" kubectl get pods -n kube-system 2>/dev/null || echo "Cannot get pods"
echo "" echo "=== Master Authorized Networks ===" gcloud container clusters describe $CLUSTER --region $REGION \ --format="yaml(masterAuthorizedNetworksConfig)" 2>/dev/null
echo "" echo "=== Recent Errors ===" gcloud logging read "resource.type=gke_cluster AND severity>=ERROR" --limit 5 --format="table(timestamp, textPayload)" 2>/dev/null || echo "No logs or permission denied"
echo "" echo "=== Firewall Rules ===" gcloud compute firewall-rules list --filter="name:gke-$CLUSTER" --format="table(name, allowed[], sourceRanges)" 2>/dev/null || echo "No firewall rules found" EOF
chmod +x /usr/local/bin/check-gke-cluster.sh
# Usage: /usr/local/bin/check-gke-cluster.sh my-cluster us-central1 ```
GKE Cluster Checklist
| Check | Command | Expected |
|---|---|---|
| Cluster status | gcloud clusters describe | RUNNING |
| Credentials | kubectl config current-context | Correct cluster |
| API server | nc endpoint 443 | Connection OK |
| Auth | gcloud auth list | Account active |
| Nodes | kubectl get nodes | Nodes Ready |
| Firewall | gcloud firewall-rules | Port 443 allowed |
Verify the Fix
```bash # After fixing cluster access
# 1. Check context kubectl config current-context // gke_project_region_cluster
# 2. Get nodes kubectl get nodes // All nodes Ready
# 3. Get pods kubectl get pods -A // Pods running
# 4. Test API curl -k https://endpoint/healthz // ok
# 5. Check auth gcloud auth list // Correct account
# 6. Monitor status kubectl cluster-info // Cluster info displayed ```
Related Issues
- [Fix Kubernetes Node Not Ready](/articles/fix-kubernetes-node-not-ready)
- [Fix Kubernetes Pod CrashLoopBackOff](/articles/fix-kubernetes-pod-crashloopbackoff)
- [Fix Kubernetes Service Not Found](/articles/fix-kubernetes-service-not-found)