Introduction
A pod in CrashLoopBackOff state means the container starts but then crashes repeatedly. Kubernetes tries to restart it, but each attempt fails, triggering an exponential backoff delay. Unlike Pending (scheduling failure) or ImagePullBackOff (image retrieval failure), CrashLoopBackOff indicates the container is being scheduled and pulled successfully—but the application inside cannot stay running.
The root cause is almost always within the container: application errors, missing configurations, failed health checks, or resource constraints. The fix requires understanding why the process exits, which means reading logs, checking exit codes, and verifying runtime configuration.
Symptoms
kubectl get podsshowsSTATUS: CrashLoopBackOffwith increasingRESTARTScountkubectl describe podshowsLast State: Terminatedwith a non-zero exit code- Container logs show error messages, panics, or connection failures before termination
- The pod cycles between
RunningandCrashLoopBackOffevery few seconds or minutes Readycondition staysFalseeven when container briefly starts
Common Causes
- **Application crash on startup**: Missing environment variables, config files, or database connections cause immediate failure
- **Liveness probe failure**: The probe endpoint is unreachable, returns 5xx, or times out
- **Resource limit exceeded**: Container is OOMKilled or CPU-throttled into failure
- **Port conflicts**: Application binds to a port already in use or different from
containerPort - **Permission denied**: Running as non-root user without proper file/directory permissions
- **Dependency unavailable**: Database, cache, or external service connections fail at startup
- **Command/args mismatch**: Container entrypoint expects arguments that aren't provided (or vice versa)
- **ConfigMap/Secret not mounted**: Expected configuration files or environment variables are missing
Step-by-Step Fix
### 1. Check pod logs and exit codes
Start with the container logs to see the actual error:
bash
kubectl logs <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --previous
The --previous flag shows logs from the last crashed instance, which often contains the actual error before termination.
Check the exit code:
bash
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.containerStatuses[*].lastState}'
Common exit codes:
| Exit Code | Meaning | Likely Cause |
|-----------|---------|--------------|
| 1 | General application error | Code exception, missing config |
| 137 | OOMKilled (128 + 9/SIGKILL) | Memory limit too low |
| 143 | SIGTERM (128 + 15) | Killed by liveness probe or shutdown |
| 126 | Command not executable | Permission denied on entrypoint |
| 127 | Command not found | Missing binary or wrong path |
### 2. Check if container is OOMKilled
bash
kubectl describe pod <pod-name> -n <namespace> | grep -A 5 "State:"
kubectl describe pod <pod-name> -n <namespace> | grep -i "oom"
If OOMKilled, the container exceeded its memory limit.
**Fix:** Increase memory limit or optimize application memory usage:
yaml
resources:
requests:
memory: "256Mi"
limits:
memory: "512Mi"
### 3. Verify liveness and readiness probes
Misconfigured probes can kill healthy containers:
bash
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].livenessProbe}'
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].readinessProbe}'
Common probe issues:
- HTTP probe returns 4xx/5xx (wrong path, app not ready)
- TCP probe on wrong port
- Exec command returns non-zero
- initialDelaySeconds too short (app hasn't started yet)
- timeoutSeconds too aggressive under load
**Fix:** Adjust probe configuration:
yaml
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
### 4. Check environment variables and ConfigMaps
Missing environment variables or ConfigMaps often cause immediate crashes:
```bash # Check pod environment kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].env}' kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].envFrom}'
# Verify referenced ConfigMaps exist kubectl get configmap -n <namespace> kubectl describe configmap <configmap-name> -n <namespace>
# Verify referenced Secrets exist kubectl get secret -n <namespace> ```
**Fix:** Create missing ConfigMaps/Secrets or update pod spec to reference correct names.
### 5. Test application configuration inside container
Use an ephemeral debug container to inspect the crashed pod:
bash
kubectl debug -it <pod-name> -n <namespace> --image=busybox --target=<container-name>
Then check:
- File permissions: ls -la /app
- Config file content: cat /app/config.yml
- Environment variables: env | grep MY_APP
- Network connectivity: nc -zv database-service 5432
### 6. Check resource limits and node pressure
bash
kubectl top pod <pod-name> -n <namespace>
kubectl describe node <node-name> | grep -A 10 "Allocated resources"
If node is under memory or disk pressure, containers may be killed unexpectedly.
### 7. Verify command and args
If the container entrypoint or command changed, the pod spec may be calling the wrong binary:
bash
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].command}'
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].args}'
Compare with the Dockerfile ENTRYPOINT and CMD:
bash
docker history <image-name> --no-trunc
**Fix:** Update command and args in pod spec to match expected entrypoint.
### 8. Check application dependencies
If the app requires a database, cache, or external API at startup:
```bash # Test DNS resolution kubectl run -it --rm dns-test --image=busybox --restart=Never -- nslookup <service-name>
# Test connectivity kubectl run -it --rm net-test --image=busybox --restart=Never -- nc -zv <service-name> <port> ```
**Fix:** - Add init containers to wait for dependencies - Use retry logic in application startup code - Deploy dependencies before the application
### 9. Check for rapid restart loops
If the container crashes too quickly, Kubernetes applies exponential backoff:
bash
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.containerStatuses[*].restartCount}'
kubectl get events -n <namespace> --field-selector involvedObject.name=<pod-name>
If restart count keeps increasing with BackOff events, the pod will eventually stabilize at a 5-minute restart interval.
**Fix:** Address the root cause (logs will show why), then delete the pod to reset backoff:
bash
kubectl delete pod <pod-name> -n <namespace>
Debugging Workflow Summary
```bash # 1. Check pod status kubectl get pod <pod-name> -n <namespace>
# 2. Get detailed events and exit codes kubectl describe pod <pod-name> -n <namespace>
# 3. Read container logs (current and previous) kubectl logs <pod-name> -n <namespace> kubectl logs <pod-name> -n <namespace> --previous
# 4. Check for OOM kubectl describe pod <pod-name> -n <namespace> | grep -i "oom"
# 5. Verify ConfigMaps and Secrets kubectl get configmap,secret -n <namespace>
# 6. Debug inside container kubectl debug -it <pod-name> -n <namespace> --image=busybox --target=<container-name>
# 7. Check events kubectl get events -n <namespace> --sort-by='.lastTimestamp' ```
Prevention Checklist
- [ ] Set
initialDelaySecondson probes to allow startup time - [ ] Use startup probes for slow-starting applications
- [ ] Test resource limits under load before production
- [ ] Validate all ConfigMaps and Secrets exist before deployment
- [ ] Use init containers to wait for dependencies
- [ ] Implement graceful shutdown with SIGTERM handling
- [ ] Add structured logging for easier debugging
- [ ] Set up alerts for CrashLoopBackOff pods older than 5 minutes
Related Issues
- [Fix Kubernetes Pod Stuck in Pending](/articles/fix-kubernetes-pod-stuck-pending)
- [Fix Kubernetes ImagePullBackOff](/articles/fix-kubernetes-imagepullbackoff)
- [Fix Kubernetes OOMKilled](/articles/fix-kubernetes-oomkilled)
- [Fix Kubernetes Liveness Probe Failed](/articles/fix-kubernetes-liveness-probe-failed)