Fix Kubernetes Deployment Failed in CI - Troubleshooting Guide

# Kubernetes Deployment Failed in CI

Common Error Patterns

Kubernetes deployment failures typically show:

bash

Error from server (Forbidden): deployments.apps is forbidden

bash

Deployment "myapp" failed: ReplicaSet "myapp-xxx" has timed out

bash

Failed to pull image "myapp:latest": rpc error: code = Unknown

bash

Failed to create pod: pod "myapp-xxx" is forbidden: exceeded quota

bash

0/3 nodes are available: 3 Insufficient cpu

Root Causes and Solutions

1. RBAC Permission Denied

CI service account lacks Kubernetes permissions.

Solution:

Create proper RBAC configuration:

```yaml # Service account for CI apiVersion: v1 kind: ServiceAccount metadata: name: ci-deployer namespace: production

--- # Role with deployment permissions apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: deployer-role namespace: production rules: - apiGroups: ["apps", "extensions"] resources: ["deployments", "replicasets", "pods"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] - apiGroups: [""] resources: ["secrets", "configmaps", "services"] verbs: ["get", "list", "create", "update", "patch"]

--- # Bind role to service account apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: deployer-binding namespace: production subjects: - kind: ServiceAccount name: ci-deployer namespace: production roleRef: kind: Role name: deployer-role apiGroup: rbac.authorization.k8s.io ```

Apply RBAC:

bash

kubectl apply -f rbac.yaml

Get token for CI:

```bash # Create long-lived token (Kubernetes 1.24+) kubectl create token ci-deployer --duration=24h -n production

# Or create secret kubectl apply -f - <<EOF apiVersion: v1 kind: Secret metadata: name: ci-deployer-token annotations: kubernetes.io/service-account.name: ci-deployer type: kubernetes.io/service-account-token EOF ```

2. Manifest Validation Errors

Invalid Kubernetes manifest syntax or configuration.

Solution:

Validate manifests before applying:

```bash # Validate with kubectl kubectl apply --dry-run=client -f deployment.yaml

# Validate server-side kubectl apply --dry-run=server -f deployment.yaml

# Use kubeval kubeval deployment.yaml

# Use kubeconform kubeconform -schema-location default deployment.yaml ```

In CI pipeline:

```yaml # GitHub Actions - name: Validate manifests run: | kubectl apply --dry-run=client -f k8s/

name: Validate with kubeconform
uses: instrumenta/kubeconform-action@v0.1.0
with:
manifests: 'k8s/*.yaml'
`

3. Image Pull Failures

Cannot pull container image from registry.

Solution:

Create image pull secret:

bash

kubectl create secret docker-registry regcred \
  --docker-server=<registry-server> \
  --docker-username=<username> \
  --docker-password=<password> \
  --docker-email=<email> \
  -n production

Reference in deployment:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    spec:
      imagePullSecrets:
      - name: regcred
      containers:
      - name: myapp
        image: registry/myapp:v1

For AWS ECR:

yaml

# Create secret with AWS credentials
kubectl create secret docker-registry ecr-cred \
  --docker-server=123456789012.dkr.ecr.us-east-1.amazonaws.com \
  --docker-username=AWS \
  --docker-password=$(aws ecr get-login-password) \
  -n production

4. Resource Quota Exceeded

Namespace quota doesn't allow deployment resources.

Solution:

Check quota:

bash

kubectl get quota -n production
kubectl describe quota production-quota -n production

Request appropriate resources:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    spec:
      containers:
      - name: myapp
        image: myapp:v1
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

Or increase quota:

yaml

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 10Gi
    limits.cpu: "20"
    limits.memory: 20Gi
    pods: "50"

5. Insufficient Node Resources

Nodes don't have enough resources for pods.

Solution:

Check node resources:

bash

kubectl describe nodes
kubectl top nodes

View available resources:

bash

kubectl describe node node-1 | grep -A 5 "Allocated resources"

Options: - Add nodes to cluster - Reduce resource requests - Use node autoscaling:

yaml

# Cluster Autoscaler configuration
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
spec:
  template:
    spec:
      containers:
      - name: cluster-autoscaler
        image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.25.0
        command:
        - ./cluster-autoscaler
        - --scale-down-delay-after-add=10m
        - --scale-down-unneeded-time=10m
        - --min-size=2
        - --max-size=10

6. Deployment Rollout Timeout

Deployment takes too long to become ready.

Solution:

Increase deployment timeout:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  progressDeadlineSeconds: 600  # Default 600 seconds
  template:
    spec:
      containers:
      - name: myapp
        image: myapp:v1
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3

Wait for rollout in CI:

```bash # Wait for deployment kubectl rollout status deployment/myapp -n production --timeout=300s

# Check rollout history kubectl rollout history deployment/myapp -n production

# View deployment status kubectl describe deployment myapp -n production ```

7. Health Check Failures

Pod fails readiness/liveness probe checks.

Solution:

Configure proper probes:

yaml

containers:
- name: myapp
  image: myapp:v1
  readinessProbe:
    httpGet:
      path: /ready
      port: 8080
    initialDelaySeconds: 10
    periodSeconds: 5
    timeoutSeconds: 3
    failureThreshold: 3
    successThreshold: 1
  livenessProbe:
    httpGet:
      path: /health
      port: 8080
    initialDelaySeconds: 30
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 3

Check pod status:

bash

kubectl describe pod myapp-xxx -n production
kubectl logs myapp-xxx -n production

8. ConfigMap/Secret Missing

Pod references missing configuration resources.

Solution:

Verify references exist:

bash

kubectl get configmap myapp-config -n production
kubectl get secret myapp-secret -n production

Create missing resources:

```yaml apiVersion: v1 kind: ConfigMap metadata: name: myapp-config data: config.yaml: | key: value

--- apiVersion: v1 kind: Secret metadata: name: myapp-secret type: Opaque stringData: password: secretvalue ```

CI Pipeline Configuration

GitHub Actions Deploy

```yaml jobs: deploy: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v4

name: Set up kubectl
uses: azure/setup-kubectl@v3

name: Configure kubeconfig
run: |
echo "${{ secrets.KUBECONFIG }}" | base64 -d > kubeconfig
export KUBECONFIG=kubeconfig

name: Validate manifests
run: kubectl apply --dry-run=client -f k8s/

name: Deploy
run: |
kubectl apply -f k8s/ -n production
kubectl rollout status deployment/myapp -n production --timeout=300s
`

GitLab CI Deploy

yaml

deploy:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config set-cluster k8s --server="$KUBE_URL"
    - kubectl config set-credentials admin --token="$KUBE_TOKEN"
    - kubectl config set-context default --cluster=k8s --user=admin
    - kubectl config use-context default
    - kubectl apply --dry-run=client -f k8s/
    - kubectl apply -f k8s/ -n production
    - kubectl rollout status deployment/myapp -n production --timeout=300s

Debugging Commands

```bash # Check deployment status kubectl get deployments -n production kubectl describe deployment myapp -n production

# Check pods kubectl get pods -n production -l app=myapp kubectl describe pod myapp-xxx -n production

# Check events kubectl get events -n production --sort-by=.metadata.creationTimestamp

# Check logs kubectl logs deployment/myapp -n production --all-containers

# Debug pod kubectl debug pod/myapp-xxx -n production -it --image=busybox

# Check rollout kubectl rollout status deployment/myapp -n production kubectl rollout history deployment/myapp -n production kubectl rollout undo deployment/myapp -n production ```

Quick Reference

Error	Command/Solution
RBAC denied	Create Role and RoleBinding
Image pull fail	Create imagePullSecret
Quota exceeded	Check and adjust ResourceQuota
No nodes available	Add nodes or reduce requests
Rollout timeout	Increase progressDeadlineSeconds
Probe failure	Configure proper health checks

Prevention Tips

1.Validate manifests before applying
2.Use --dry-run=server to catch server-side errors
3.Set proper resource requests and limits
4.Configure readiness probes
5.Use deployment strategies (rolling update)
6.Set up proper RBAC for CI service account

[Docker Build Failed in CI](#)
[AWS ECS Task Stopped](#)
[Terraform Plan Failed in CI](#)

Kubernetes Deployment Failed in CI

Common Error Patterns

Root Causes and Solutions

1. RBAC Permission Denied

2. Manifest Validation Errors

3. Image Pull Failures

4. Resource Quota Exceeded

5. Insufficient Node Resources

6. Deployment Rollout Timeout

7. Health Check Failures

8. ConfigMap/Secret Missing

CI Pipeline Configuration

GitHub Actions Deploy

GitLab CI Deploy

Debugging Commands

Quick Reference

Prevention Tips

Related Articles

Share this guide

More CI/CD Troubleshooting Guides

Tekton Workspace Not Bound

Tekton TaskRun Timeout

Tekton PipelineRun Failed

Flux Source Not Ready

Flux Helm Release Failed

Flux Kustomization Not Applying