Fix Kubernetes Pod CrashLoopBackOff

Introduction

A pod in CrashLoopBackOff state means the container starts but then crashes repeatedly. Kubernetes tries to restart it, but each attempt fails, triggering an exponential backoff delay. Unlike Pending (scheduling failure) or ImagePullBackOff (image retrieval failure), CrashLoopBackOff indicates the container is being scheduled and pulled successfully—but the application inside cannot stay running.

The root cause is almost always within the container: application errors, missing configurations, failed health checks, or resource constraints. The fix requires understanding why the process exits, which means reading logs, checking exit codes, and verifying runtime configuration.

Symptoms

kubectl get pods shows STATUS: CrashLoopBackOff with increasing RESTARTS count
kubectl describe pod shows Last State: Terminated with a non-zero exit code
Container logs show error messages, panics, or connection failures before termination
The pod cycles between Running and CrashLoopBackOff every few seconds or minutes
Ready condition stays False even when container briefly starts

Common Causes

**Application crash on startup**: Missing environment variables, config files, or database connections cause immediate failure
**Liveness probe failure**: The probe endpoint is unreachable, returns 5xx, or times out
**Resource limit exceeded**: Container is OOMKilled or CPU-throttled into failure
**Port conflicts**: Application binds to a port already in use or different from containerPort
**Permission denied**: Running as non-root user without proper file/directory permissions
**Dependency unavailable**: Database, cache, or external service connections fail at startup
**Command/args mismatch**: Container entrypoint expects arguments that aren't provided (or vice versa)
**ConfigMap/Secret not mounted**: Expected configuration files or environment variables are missing

Step-by-Step Fix

### 1. Check pod logs and exit codes

Start with the container logs to see the actual error:

bash kubectl logs <pod-name> -n <namespace> kubectl logs <pod-name> -n <namespace> --previous

The --previous flag shows logs from the last crashed instance, which often contains the actual error before termination.

Check the exit code:

bash kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.containerStatuses[*].lastState}'

Common exit codes:

| Exit Code | Meaning | Likely Cause | |-----------|---------|--------------| | 1 | General application error | Code exception, missing config | | 137 | OOMKilled (128 + 9/SIGKILL) | Memory limit too low | | 143 | SIGTERM (128 + 15) | Killed by liveness probe or shutdown | | 126 | Command not executable | Permission denied on entrypoint | | 127 | Command not found | Missing binary or wrong path |

### 2. Check if container is OOMKilled

bash kubectl describe pod <pod-name> -n <namespace> | grep -A 5 "State:" kubectl describe pod <pod-name> -n <namespace> | grep -i "oom"

If OOMKilled, the container exceeded its memory limit.

**Fix:** Increase memory limit or optimize application memory usage:

yaml resources: requests: memory: "256Mi" limits: memory: "512Mi"

### 3. Verify liveness and readiness probes

Misconfigured probes can kill healthy containers:

bash kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].livenessProbe}' kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].readinessProbe}'

Common probe issues: - HTTP probe returns 4xx/5xx (wrong path, app not ready) - TCP probe on wrong port - Exec command returns non-zero - initialDelaySeconds too short (app hasn't started yet) - timeoutSeconds too aggressive under load

**Fix:** Adjust probe configuration:

yaml livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3

### 4. Check environment variables and ConfigMaps

Missing environment variables or ConfigMaps often cause immediate crashes:

```bash # Check pod environment kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].env}' kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].envFrom}'

# Verify referenced ConfigMaps exist kubectl get configmap -n <namespace> kubectl describe configmap <configmap-name> -n <namespace>

# Verify referenced Secrets exist kubectl get secret -n <namespace> ```

**Fix:** Create missing ConfigMaps/Secrets or update pod spec to reference correct names.

### 5. Test application configuration inside container

Use an ephemeral debug container to inspect the crashed pod:

bash kubectl debug -it <pod-name> -n <namespace> --image=busybox --target=<container-name>

Then check: - File permissions: ls -la /app - Config file content: cat /app/config.yml - Environment variables: env | grep MY_APP - Network connectivity: nc -zv database-service 5432

### 6. Check resource limits and node pressure

bash kubectl top pod <pod-name> -n <namespace> kubectl describe node <node-name> | grep -A 10 "Allocated resources"

If node is under memory or disk pressure, containers may be killed unexpectedly.

### 7. Verify command and args

If the container entrypoint or command changed, the pod spec may be calling the wrong binary:

bash kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].command}' kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].args}'

Compare with the Dockerfile ENTRYPOINT and CMD:

bash docker history <image-name> --no-trunc

**Fix:** Update command and args in pod spec to match expected entrypoint.

### 8. Check application dependencies

If the app requires a database, cache, or external API at startup:

```bash # Test DNS resolution kubectl run -it --rm dns-test --image=busybox --restart=Never -- nslookup <service-name>

# Test connectivity kubectl run -it --rm net-test --image=busybox --restart=Never -- nc -zv <service-name> <port> ```

**Fix:** - Add init containers to wait for dependencies - Use retry logic in application startup code - Deploy dependencies before the application

### 9. Check for rapid restart loops

If the container crashes too quickly, Kubernetes applies exponential backoff:

bash kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.containerStatuses[*].restartCount}' kubectl get events -n <namespace> --field-selector involvedObject.name=<pod-name>

If restart count keeps increasing with BackOff events, the pod will eventually stabilize at a 5-minute restart interval.

**Fix:** Address the root cause (logs will show why), then delete the pod to reset backoff:

bash kubectl delete pod <pod-name> -n <namespace>

Debugging Workflow Summary

```bash # 1. Check pod status kubectl get pod <pod-name> -n <namespace>

# 2. Get detailed events and exit codes kubectl describe pod <pod-name> -n <namespace>

# 3. Read container logs (current and previous) kubectl logs <pod-name> -n <namespace> kubectl logs <pod-name> -n <namespace> --previous

# 4. Check for OOM kubectl describe pod <pod-name> -n <namespace> | grep -i "oom"

# 5. Verify ConfigMaps and Secrets kubectl get configmap,secret -n <namespace>

# 6. Debug inside container kubectl debug -it <pod-name> -n <namespace> --image=busybox --target=<container-name>

# 7. Check events kubectl get events -n <namespace> --sort-by='.lastTimestamp' ```

Prevention Checklist

[ ] Set initialDelaySeconds on probes to allow startup time
[ ] Use startup probes for slow-starting applications
[ ] Test resource limits under load before production
[ ] Validate all ConfigMaps and Secrets exist before deployment
[ ] Use init containers to wait for dependencies
[ ] Implement graceful shutdown with SIGTERM handling
[ ] Add structured logging for easier debugging
[ ] Set up alerts for CrashLoopBackOff pods older than 5 minutes

[Fix Kubernetes Pod Stuck in Pending](/articles/fix-kubernetes-pod-stuck-pending)
[Fix Kubernetes ImagePullBackOff](/articles/fix-kubernetes-imagepullbackoff)
[Fix Kubernetes OOMKilled](/articles/fix-kubernetes-oomkilled)
[Fix Kubernetes Liveness Probe Failed](/articles/fix-kubernetes-liveness-probe-failed)