Fix Kubernetes CrashLoopBackOff Pod

Introduction

Kubernetes CrashLoopBackOff occurs when a pod repeatedly crashes and restarts, with Kubernetes backing off before restarting again. The pod enters a failure cycle where the container starts, crashes (exits with non-zero code), waits with exponential backoff (10s, 20s, 40s, up to 5 minutes), then restarts. This state indicates a fundamental application or configuration issue preventing successful startup. Common causes include application bugs, missing configuration, failed health checks, resource exhaustion, or dependency unavailability.

Symptoms

kubectl get pods shows CrashLoopBackOff or Error status
Pod restart count increases continuously
Container logs show startup errors or panics
Events show repeated Started container followed by Killing container
Issue appears after deploy, configuration change, or dependency outage
Different pods in same deployment may show different states (Running, CrashLoopBackOff)

Common Causes

Application crash during startup (code error, unhandled exception)
Missing or invalid ConfigMap/Secret values
Environment variables not set or incorrect
Resource limits too low (OOMKilled)
Liveness probe failing before application ready
Port binding conflicts or address already in use
Database or external dependency unavailable
Image pull errors or missing binaries in container

Step-by-Step Fix

### 1. Check pod status and restart count

Get detailed pod status:

```bash # Check pod status kubectl get pods -n <namespace>

# Output: # NAME READY STATUS RESTARTS AGE # myapp-5d4f6c7b8-x9y2z 0/1 CrashLoopBackOff 15 45m

# Get detailed pod information kubectl describe pod <pod-name> -n <namespace>

# Key sections to check: # - State: Waiting/CrashLoopBackOff # - Last State: Terminated (shows exit code) # - Reason: Error, OOMKilled, Completed # - Exit Code: 0 (success), 1-255 (error codes) # - Restart Count: Number of restarts ```

Exit code meanings: - 0: Clean exit (but container may have completed unexpectedly) - 1: Application error (unhandled exception, panic) - 137: OOMKilled (128 + SIGKILL=9) - memory limit exceeded - 143: SIGTERM (128 + 15) - graceful shutdown - 125: Container runtime error (image not found, permission denied) - 126: Command cannot execute (permission denied) - 127: Command not found

### 2. Check container logs

View application logs for error details:

```bash # Get current container logs kubectl logs <pod-name> -n <namespace>

# Get logs from previous (crashed) instance kubectl logs <pod-name> -n <namespace> --previous

# Follow logs in real-time kubectl logs -f <pod-name> -n <namespace>

# For multi-container pods kubectl logs <pod-name> -n <namespace> -c <container-name> kubectl logs <pod-name> -n <namespace> -c <container-name> --previous ```

Log analysis patterns:

```bash # Search for errors kubectl logs <pod-name> -n <namespace> --previous | grep -iE "error|exception|fatal|panic"

# Check startup sequence kubectl logs <pod-name> -n <namespace> --previous | head -50

# Check database connectivity errors kubectl logs <pod-name> -n <namespace> --previous | grep -iE "database|connection|sql|redis" ```

### 3. Check pod events

Kubernetes events reveal scheduling and lifecycle issues:

```bash # Get events for specific pod kubectl get events -n <namespace> --field-selector involvedObject.name=<pod-name>

# Get all namespace events sorted by time kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# Or with describe kubectl describe pod <pod-name> -n <namespace> | grep -A5 "Events:" ```

Key event types: - Scheduled: Pod assigned to node - Pulled: Container image pulled successfully - Created: Container created - Started: Container started - Killing: Container being killed (check reason) - BackOff: Restart backed off - Unhealthy: Probe failed

### 4. Check for OOMKilled (Out of Memory)

Memory limit exhaustion is a common cause:

```bash # Check if container was OOMKilled kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.containerStatuses[0].lastState.terminated.reason}'

# Should output: OOMKilled

# Check memory limits kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources.limits.memory}'

# Check actual memory usage before crash kubectl describe pod <pod-name> -n <namespace> | grep -A5 "Last State:"

# If OOMKilled, increase memory limits kubectl edit deployment <deployment-name> -n <namespace>

# Update: spec: containers: - name: <container-name> resources: limits: memory: 512Mi # Increase from 256Mi requests: memory: 256Mi ```

### 5. Check liveness and readiness probe configuration

Probe failures can cause restart loops:

```bash # Check probe configuration kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].livenessProbe}' kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].readinessProbe}'

# Or with describe kubectl describe pod <pod-name> -n <namespace> | grep -A10 "Liveness:" kubectl describe pod <pod-name> -n <namespace> | grep -A10 "Readiness:" ```

Common probe issues:

```yaml # WRONG: Probe fires before application ready livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 5 # Too short for app startup periodSeconds: 10 failureThreshold: 3 # Fails after 30 seconds total

# CORRECT: Allow time for startup livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 60 # Wait 60s before first probe periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 # Fail after 3 consecutive failures successThreshold: 1

readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 10 periodSeconds: 5 timeoutSeconds: 3 failureThreshold: 3 ```

Temporarily disable probes for debugging:

bash kubectl edit deployment <deployment-name> -n <namespace> # Comment out livenessProbe and readinessProbe # Apply and observe if pod stays running

### 6. Check ConfigMap and Secret references

Missing configuration causes startup failures:

```bash # Check which ConfigMaps/Secrets pod references kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].envFrom}' kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].env}' kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].volumeMounts}'

# Check if ConfigMaps exist kubectl get configmap -n <namespace>

# Check if Secrets exist kubectl get secret -n <namespace>

# Validate ConfigMap content kubectl get configmap <configmap-name> -n <namespace> -o yaml

# Check for missing environment variables kubectl describe pod <pod-name> -n <namespace> | grep -A20 "Environment:" ```

Common issues: - ConfigMap/Secret doesn't exist in namespace - Key names don't match what application expects - ConfigMap mounted but application expects environment variable

### 7. Check application dependencies

Application may crash waiting for dependencies:

```bash # Check if database is reachable from pod kubectl exec <pod-name> -n <namespace> -it -- nc -zv <db-host> 5432

# Or with timeout kubectl exec <pod-name> -n <namespace> -it -- timeout 5 bash -c "cat < /dev/null > /dev/tcp/<db-host>/5432"

# Check if Redis/cache is available kubectl exec <pod-name> -n <namespace> -it -- redis-cli -h <redis-host> ping

# Check DNS resolution kubectl exec <pod-name> -n <namespace> -it -- nslookup <service-name>

# Test HTTP endpoints kubectl exec <pod-name> -n <namespace> -it -- curl -v http://<dependency-service>/health ```

### 8. Check for port conflicts

Application may fail to bind to port:

```bash # Check what port application is trying to bind kubectl logs <pod-name> -n <namespace> --previous | grep -iE "bind|listen|port|address"

# Common error: "address already in use" # Means another process in container has the port

# Check container for multiple processes kubectl exec <pod-name> -n <namespace> -it -- ps aux

# Check listening ports kubectl exec <pod-name> -n <namespace> -it -- netstat -tlnp

# Check if port is already bound kubectl exec <pod-name> -n <namespace> -it -- ss -tlnp | grep :8080 ```

Verify container port matches application:

```yaml # Deployment should expose correct port spec: containers: - name: app ports: - containerPort: 8080 # Must match what app binds to

# Application config should match # app-config.yaml server: port: 8080 # Same as containerPort ```

### 9. Debug with interactive shell

Get shell access to debug container:

```bash # Run interactive shell (if image has shell) kubectl run -it debug-pod -n <namespace> --image=<same-image> --rm --restart=Never -- /bin/sh

# Or override entrypoint for debugging kubectl run -it debug-pod -n <namespace> --image=<same-image> --rm --restart=Never -- /bin/bash

# Mount same volumes kubectl run -it debug-pod -n <namespace> \ --image=<same-image> \ --rm --restart=Never \ --overrides='{"spec":{"volumes":[{"name":"config","configMap":{"name":"<configmap>"}}],"containers":[{"name":"debug","image":"<image>","volumeMounts":[{"name":"config","mountPath":"/config"}]}]}}' \ -- /bin/sh ```

Check application binary and permissions:

```bash # Check if binary exists ls -la /app/myapp

# Check permissions ls -la /app/

# Check if binary can execute /app/myapp --version

# Check for missing libraries ldd /app/myapp 2>&1 | grep "not found" ```

### 10. Enable debug logging

Increase application log verbosity:

```bash # Add debug environment variable kubectl edit deployment <deployment-name> -n <namespace>

# Add to spec: spec: containers: - name: app env: - name: DEBUG value: "true" - name: LOG_LEVEL value: "debug" ```

For Java applications:

yaml spec: containers: - name: app env: - name: JAVA_OPTS value: "-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005"

For Go applications (with zap/logrus):

yaml spec: containers: - name: app env: - name: LOG_FORMAT value: "json" - name: LOG_LEVEL value: "debug"

Prevention

Set appropriate resource requests and limits based on load testing
Configure liveness probes with adequate initialDelaySeconds
Use readiness probes to prevent traffic before fully ready
Implement graceful shutdown with SIGTERM handling
Add startup probes for slow-starting applications
Use PodDisruptionBudget to prevent simultaneous restarts
Implement proper health check endpoints (/healthz, /ready)
Monitor restart count as leading indicator

```yaml # Production-ready probe configuration livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3

readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 10 periodSeconds: 5 timeoutSeconds: 3 failureThreshold: 3

startupProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 0 periodSeconds: 5 failureThreshold: 30 # Allow up to 150 seconds for startup ```

**OOMKilled**: Container exceeded memory limit
**ImagePullBackOff**: Container image cannot be pulled
**Error**: Container exited with non-zero code
**Pending**: Pod cannot be scheduled to a node

How to Fix Kubernetes CrashLoopBackOff Pod

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Prevention

Related Errors

Share this guide