# Kubernetes Deployment Failed in CI

Common Error Patterns

Kubernetes deployment failures typically show:

bash
Error from server (Forbidden): deployments.apps is forbidden
bash
Deployment "myapp" failed: ReplicaSet "myapp-xxx" has timed out
bash
Failed to pull image "myapp:latest": rpc error: code = Unknown
bash
Failed to create pod: pod "myapp-xxx" is forbidden: exceeded quota
bash
0/3 nodes are available: 3 Insufficient cpu

Root Causes and Solutions

1. RBAC Permission Denied

CI service account lacks Kubernetes permissions.

Solution:

Create proper RBAC configuration:

```yaml # Service account for CI apiVersion: v1 kind: ServiceAccount metadata: name: ci-deployer namespace: production

--- # Role with deployment permissions apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: deployer-role namespace: production rules: - apiGroups: ["apps", "extensions"] resources: ["deployments", "replicasets", "pods"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] - apiGroups: [""] resources: ["secrets", "configmaps", "services"] verbs: ["get", "list", "create", "update", "patch"]

--- # Bind role to service account apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: deployer-binding namespace: production subjects: - kind: ServiceAccount name: ci-deployer namespace: production roleRef: kind: Role name: deployer-role apiGroup: rbac.authorization.k8s.io ```

Apply RBAC:

bash
kubectl apply -f rbac.yaml

Get token for CI:

```bash # Create long-lived token (Kubernetes 1.24+) kubectl create token ci-deployer --duration=24h -n production

# Or create secret kubectl apply -f - <<EOF apiVersion: v1 kind: Secret metadata: name: ci-deployer-token annotations: kubernetes.io/service-account.name: ci-deployer type: kubernetes.io/service-account-token EOF ```

2. Manifest Validation Errors

Invalid Kubernetes manifest syntax or configuration.

Solution:

Validate manifests before applying:

```bash # Validate with kubectl kubectl apply --dry-run=client -f deployment.yaml

# Validate server-side kubectl apply --dry-run=server -f deployment.yaml

# Use kubeval kubeval deployment.yaml

# Use kubeconform kubeconform -schema-location default deployment.yaml ```

In CI pipeline:

```yaml # GitHub Actions - name: Validate manifests run: | kubectl apply --dry-run=client -f k8s/

  • name: Validate with kubeconform
  • uses: instrumenta/kubeconform-action@v0.1.0
  • with:
  • manifests: 'k8s/*.yaml'
  • `

3. Image Pull Failures

Cannot pull container image from registry.

Solution:

Create image pull secret:

bash
kubectl create secret docker-registry regcred \
  --docker-server=<registry-server> \
  --docker-username=<username> \
  --docker-password=<password> \
  --docker-email=<email> \
  -n production

Reference in deployment:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    spec:
      imagePullSecrets:
      - name: regcred
      containers:
      - name: myapp
        image: registry/myapp:v1

For AWS ECR:

yaml
# Create secret with AWS credentials
kubectl create secret docker-registry ecr-cred \
  --docker-server=123456789012.dkr.ecr.us-east-1.amazonaws.com \
  --docker-username=AWS \
  --docker-password=$(aws ecr get-login-password) \
  -n production

4. Resource Quota Exceeded

Namespace quota doesn't allow deployment resources.

Solution:

Check quota:

bash
kubectl get quota -n production
kubectl describe quota production-quota -n production

Request appropriate resources:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    spec:
      containers:
      - name: myapp
        image: myapp:v1
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

Or increase quota:

yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 10Gi
    limits.cpu: "20"
    limits.memory: 20Gi
    pods: "50"

5. Insufficient Node Resources

Nodes don't have enough resources for pods.

Solution:

Check node resources:

bash
kubectl describe nodes
kubectl top nodes

View available resources:

bash
kubectl describe node node-1 | grep -A 5 "Allocated resources"

Options: - Add nodes to cluster - Reduce resource requests - Use node autoscaling:

yaml
# Cluster Autoscaler configuration
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
spec:
  template:
    spec:
      containers:
      - name: cluster-autoscaler
        image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.25.0
        command:
        - ./cluster-autoscaler
        - --scale-down-delay-after-add=10m
        - --scale-down-unneeded-time=10m
        - --min-size=2
        - --max-size=10

6. Deployment Rollout Timeout

Deployment takes too long to become ready.

Solution:

Increase deployment timeout:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  progressDeadlineSeconds: 600  # Default 600 seconds
  template:
    spec:
      containers:
      - name: myapp
        image: myapp:v1
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3

Wait for rollout in CI:

```bash # Wait for deployment kubectl rollout status deployment/myapp -n production --timeout=300s

# Check rollout history kubectl rollout history deployment/myapp -n production

# View deployment status kubectl describe deployment myapp -n production ```

7. Health Check Failures

Pod fails readiness/liveness probe checks.

Solution:

Configure proper probes:

yaml
containers:
- name: myapp
  image: myapp:v1
  readinessProbe:
    httpGet:
      path: /ready
      port: 8080
    initialDelaySeconds: 10
    periodSeconds: 5
    timeoutSeconds: 3
    failureThreshold: 3
    successThreshold: 1
  livenessProbe:
    httpGet:
      path: /health
      port: 8080
    initialDelaySeconds: 30
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 3

Check pod status:

bash
kubectl describe pod myapp-xxx -n production
kubectl logs myapp-xxx -n production

8. ConfigMap/Secret Missing

Pod references missing configuration resources.

Solution:

Verify references exist:

bash
kubectl get configmap myapp-config -n production
kubectl get secret myapp-secret -n production

Create missing resources:

```yaml apiVersion: v1 kind: ConfigMap metadata: name: myapp-config data: config.yaml: | key: value

--- apiVersion: v1 kind: Secret metadata: name: myapp-secret type: Opaque stringData: password: secretvalue ```

CI Pipeline Configuration

GitHub Actions Deploy

```yaml jobs: deploy: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v4

  • name: Set up kubectl
  • uses: azure/setup-kubectl@v3
  • name: Configure kubeconfig
  • run: |
  • echo "${{ secrets.KUBECONFIG }}" | base64 -d > kubeconfig
  • export KUBECONFIG=kubeconfig
  • name: Validate manifests
  • run: kubectl apply --dry-run=client -f k8s/
  • name: Deploy
  • run: |
  • kubectl apply -f k8s/ -n production
  • kubectl rollout status deployment/myapp -n production --timeout=300s
  • `

GitLab CI Deploy

yaml
deploy:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config set-cluster k8s --server="$KUBE_URL"
    - kubectl config set-credentials admin --token="$KUBE_TOKEN"
    - kubectl config set-context default --cluster=k8s --user=admin
    - kubectl config use-context default
    - kubectl apply --dry-run=client -f k8s/
    - kubectl apply -f k8s/ -n production
    - kubectl rollout status deployment/myapp -n production --timeout=300s

Debugging Commands

```bash # Check deployment status kubectl get deployments -n production kubectl describe deployment myapp -n production

# Check pods kubectl get pods -n production -l app=myapp kubectl describe pod myapp-xxx -n production

# Check events kubectl get events -n production --sort-by=.metadata.creationTimestamp

# Check logs kubectl logs deployment/myapp -n production --all-containers

# Debug pod kubectl debug pod/myapp-xxx -n production -it --image=busybox

# Check rollout kubectl rollout status deployment/myapp -n production kubectl rollout history deployment/myapp -n production kubectl rollout undo deployment/myapp -n production ```

Quick Reference

ErrorCommand/Solution
RBAC deniedCreate Role and RoleBinding
Image pull failCreate imagePullSecret
Quota exceededCheck and adjust ResourceQuota
No nodes availableAdd nodes or reduce requests
Rollout timeoutIncrease progressDeadlineSeconds
Probe failureConfigure proper health checks

Prevention Tips

  1. 1.Validate manifests before applying
  2. 2.Use --dry-run=server to catch server-side errors
  3. 3.Set proper resource requests and limits
  4. 4.Configure readiness probes
  5. 5.Use deployment strategies (rolling update)
  6. 6.Set up proper RBAC for CI service account
  • [Docker Build Failed in CI](#)
  • [AWS ECS Task Stopped](#)
  • [Terraform Plan Failed in CI](#)