Introduction
Kubernetes CrashLoopBackOff occurs when a pod repeatedly crashes and restarts, with Kubernetes backing off before restarting again. The pod enters a failure cycle where the container starts, crashes (exits with non-zero code), waits with exponential backoff (10s, 20s, 40s, up to 5 minutes), then restarts. This state indicates a fundamental application or configuration issue preventing successful startup. Common causes include application bugs, missing configuration, failed health checks, resource exhaustion, or dependency unavailability.
Symptoms
kubectl get podsshowsCrashLoopBackOfforErrorstatus- Pod restart count increases continuously
- Container logs show startup errors or panics
- Events show repeated
Started containerfollowed byKilling container - Issue appears after deploy, configuration change, or dependency outage
- Different pods in same deployment may show different states (Running, CrashLoopBackOff)
Common Causes
- Application crash during startup (code error, unhandled exception)
- Missing or invalid ConfigMap/Secret values
- Environment variables not set or incorrect
- Resource limits too low (OOMKilled)
- Liveness probe failing before application ready
- Port binding conflicts or address already in use
- Database or external dependency unavailable
- Image pull errors or missing binaries in container
Step-by-Step Fix
### 1. Check pod status and restart count
Get detailed pod status:
```bash # Check pod status kubectl get pods -n <namespace>
# Output: # NAME READY STATUS RESTARTS AGE # myapp-5d4f6c7b8-x9y2z 0/1 CrashLoopBackOff 15 45m
# Get detailed pod information kubectl describe pod <pod-name> -n <namespace>
# Key sections to check: # - State: Waiting/CrashLoopBackOff # - Last State: Terminated (shows exit code) # - Reason: Error, OOMKilled, Completed # - Exit Code: 0 (success), 1-255 (error codes) # - Restart Count: Number of restarts ```
Exit code meanings:
- 0: Clean exit (but container may have completed unexpectedly)
- 1: Application error (unhandled exception, panic)
- 137: OOMKilled (128 + SIGKILL=9) - memory limit exceeded
- 143: SIGTERM (128 + 15) - graceful shutdown
- 125: Container runtime error (image not found, permission denied)
- 126: Command cannot execute (permission denied)
- 127: Command not found
### 2. Check container logs
View application logs for error details:
```bash # Get current container logs kubectl logs <pod-name> -n <namespace>
# Get logs from previous (crashed) instance kubectl logs <pod-name> -n <namespace> --previous
# Follow logs in real-time kubectl logs -f <pod-name> -n <namespace>
# For multi-container pods kubectl logs <pod-name> -n <namespace> -c <container-name> kubectl logs <pod-name> -n <namespace> -c <container-name> --previous ```
Log analysis patterns:
```bash # Search for errors kubectl logs <pod-name> -n <namespace> --previous | grep -iE "error|exception|fatal|panic"
# Check startup sequence kubectl logs <pod-name> -n <namespace> --previous | head -50
# Check database connectivity errors kubectl logs <pod-name> -n <namespace> --previous | grep -iE "database|connection|sql|redis" ```
### 3. Check pod events
Kubernetes events reveal scheduling and lifecycle issues:
```bash # Get events for specific pod kubectl get events -n <namespace> --field-selector involvedObject.name=<pod-name>
# Get all namespace events sorted by time kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# Or with describe kubectl describe pod <pod-name> -n <namespace> | grep -A5 "Events:" ```
Key event types:
- Scheduled: Pod assigned to node
- Pulled: Container image pulled successfully
- Created: Container created
- Started: Container started
- Killing: Container being killed (check reason)
- BackOff: Restart backed off
- Unhealthy: Probe failed
### 4. Check for OOMKilled (Out of Memory)
Memory limit exhaustion is a common cause:
```bash # Check if container was OOMKilled kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.containerStatuses[0].lastState.terminated.reason}'
# Should output: OOMKilled
# Check memory limits kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources.limits.memory}'
# Check actual memory usage before crash kubectl describe pod <pod-name> -n <namespace> | grep -A5 "Last State:"
# If OOMKilled, increase memory limits kubectl edit deployment <deployment-name> -n <namespace>
# Update: spec: containers: - name: <container-name> resources: limits: memory: 512Mi # Increase from 256Mi requests: memory: 256Mi ```
### 5. Check liveness and readiness probe configuration
Probe failures can cause restart loops:
```bash # Check probe configuration kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].livenessProbe}' kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].readinessProbe}'
# Or with describe kubectl describe pod <pod-name> -n <namespace> | grep -A10 "Liveness:" kubectl describe pod <pod-name> -n <namespace> | grep -A10 "Readiness:" ```
Common probe issues:
```yaml # WRONG: Probe fires before application ready livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 5 # Too short for app startup periodSeconds: 10 failureThreshold: 3 # Fails after 30 seconds total
# CORRECT: Allow time for startup livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 60 # Wait 60s before first probe periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 # Fail after 3 consecutive failures successThreshold: 1
readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 10 periodSeconds: 5 timeoutSeconds: 3 failureThreshold: 3 ```
Temporarily disable probes for debugging:
bash
kubectl edit deployment <deployment-name> -n <namespace>
# Comment out livenessProbe and readinessProbe
# Apply and observe if pod stays running
### 6. Check ConfigMap and Secret references
Missing configuration causes startup failures:
```bash # Check which ConfigMaps/Secrets pod references kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].envFrom}' kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].env}' kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].volumeMounts}'
# Check if ConfigMaps exist kubectl get configmap -n <namespace>
# Check if Secrets exist kubectl get secret -n <namespace>
# Validate ConfigMap content kubectl get configmap <configmap-name> -n <namespace> -o yaml
# Check for missing environment variables kubectl describe pod <pod-name> -n <namespace> | grep -A20 "Environment:" ```
Common issues: - ConfigMap/Secret doesn't exist in namespace - Key names don't match what application expects - ConfigMap mounted but application expects environment variable
### 7. Check application dependencies
Application may crash waiting for dependencies:
```bash # Check if database is reachable from pod kubectl exec <pod-name> -n <namespace> -it -- nc -zv <db-host> 5432
# Or with timeout kubectl exec <pod-name> -n <namespace> -it -- timeout 5 bash -c "cat < /dev/null > /dev/tcp/<db-host>/5432"
# Check if Redis/cache is available kubectl exec <pod-name> -n <namespace> -it -- redis-cli -h <redis-host> ping
# Check DNS resolution kubectl exec <pod-name> -n <namespace> -it -- nslookup <service-name>
# Test HTTP endpoints kubectl exec <pod-name> -n <namespace> -it -- curl -v http://<dependency-service>/health ```
### 8. Check for port conflicts
Application may fail to bind to port:
```bash # Check what port application is trying to bind kubectl logs <pod-name> -n <namespace> --previous | grep -iE "bind|listen|port|address"
# Common error: "address already in use" # Means another process in container has the port
# Check container for multiple processes kubectl exec <pod-name> -n <namespace> -it -- ps aux
# Check listening ports kubectl exec <pod-name> -n <namespace> -it -- netstat -tlnp
# Check if port is already bound kubectl exec <pod-name> -n <namespace> -it -- ss -tlnp | grep :8080 ```
Verify container port matches application:
```yaml # Deployment should expose correct port spec: containers: - name: app ports: - containerPort: 8080 # Must match what app binds to
# Application config should match # app-config.yaml server: port: 8080 # Same as containerPort ```
### 9. Debug with interactive shell
Get shell access to debug container:
```bash # Run interactive shell (if image has shell) kubectl run -it debug-pod -n <namespace> --image=<same-image> --rm --restart=Never -- /bin/sh
# Or override entrypoint for debugging kubectl run -it debug-pod -n <namespace> --image=<same-image> --rm --restart=Never -- /bin/bash
# Mount same volumes kubectl run -it debug-pod -n <namespace> \ --image=<same-image> \ --rm --restart=Never \ --overrides='{"spec":{"volumes":[{"name":"config","configMap":{"name":"<configmap>"}}],"containers":[{"name":"debug","image":"<image>","volumeMounts":[{"name":"config","mountPath":"/config"}]}]}}' \ -- /bin/sh ```
Check application binary and permissions:
```bash # Check if binary exists ls -la /app/myapp
# Check permissions ls -la /app/
# Check if binary can execute /app/myapp --version
# Check for missing libraries ldd /app/myapp 2>&1 | grep "not found" ```
### 10. Enable debug logging
Increase application log verbosity:
```bash # Add debug environment variable kubectl edit deployment <deployment-name> -n <namespace>
# Add to spec: spec: containers: - name: app env: - name: DEBUG value: "true" - name: LOG_LEVEL value: "debug" ```
For Java applications:
yaml
spec:
containers:
- name: app
env:
- name: JAVA_OPTS
value: "-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005"
For Go applications (with zap/logrus):
yaml
spec:
containers:
- name: app
env:
- name: LOG_FORMAT
value: "json"
- name: LOG_LEVEL
value: "debug"
Prevention
- Set appropriate resource requests and limits based on load testing
- Configure liveness probes with adequate
initialDelaySeconds - Use readiness probes to prevent traffic before fully ready
- Implement graceful shutdown with SIGTERM handling
- Add startup probes for slow-starting applications
- Use PodDisruptionBudget to prevent simultaneous restarts
- Implement proper health check endpoints (/healthz, /ready)
- Monitor restart count as leading indicator
```yaml # Production-ready probe configuration livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3
readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 10 periodSeconds: 5 timeoutSeconds: 3 failureThreshold: 3
startupProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 0 periodSeconds: 5 failureThreshold: 30 # Allow up to 150 seconds for startup ```
Related Errors
- **OOMKilled**: Container exceeded memory limit
- **ImagePullBackOff**: Container image cannot be pulled
- **Error**: Container exited with non-zero code
- **Pending**: Pod cannot be scheduled to a node