You're updating your deployment with a new image or configuration, but the rollout isn't progressing. The deployment shows partial completion with some new pods and some old pods, but it never finishes. Rolling updates stuck in this state leave your application in an inconsistent state and prevent further updates.
Understanding Rolling Updates
Kubernetes deployments use rolling updates to progressively replace old pods with new ones while maintaining availability. When stuck, it means new pods aren't becoming ready, or the deployment controller is waiting for conditions that never occur. The rollout can hang due to pod failures, insufficient resources, or misconfigured update parameters.
Diagnosis Commands
Check deployment status:
```bash # List deployments kubectl get deployments -n namespace
# Describe deployment kubectl describe deployment deployment-name -n namespace
# Check rollout status kubectl rollout status deployment/deployment-name -n namespace
# Get deployment conditions kubectl get deployment deployment-name -n namespace -o jsonpath='{.status.conditions}' ```
Check replica sets:
```bash # List replica sets for deployment kubectl get rs -n namespace -l app=deployment-name
# Describe current and new replica sets kubectl describe rs deployment-name-xxxx -n namespace ```
Check pods:
```bash # List pods by revision kubectl get pods -n namespace -l app=deployment-name --show-labels
# Check pod status kubectl describe pod pod-name -n namespace
# Check new pod logs kubectl logs pod-name -n namespace --prefix ```
Check events:
```bash # Deployment events kubectl get events -n namespace --field-selector involvedObject.name=deployment-name
# All recent events kubectl get events -n namespace --sort-by='.lastTimestamp' ```
Common Solutions
Solution 1: Fix ProgressDeadlineSecondsExceeded
Deployment has a deadline for rollout completion:
```bash # Check if progress deadline exceeded kubectl describe deployment deployment-name -n namespace | grep -i "ProgressDeadlineExceeded"
# Current progress deadline kubectl get deployment deployment-name -n namespace -o jsonpath='{.spec.progressDeadlineSeconds}' # Default: 600 seconds (10 minutes) ```
If deadline exceeded due to slow pods:
```yaml # Increase progress deadline spec: progressDeadlineSeconds: 1200 # 20 minutes
# If pods take time to start spec: strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 maxSurge: 1 ```
Solution 2: Fix Pod Startup Issues
New pods may not become ready:
```bash # Check new pod status kubectl get pods -n namespace -l app=deployment-name -o wide
# Check pod readiness kubectl describe pod new-pod-name -n namespace | grep -A 5 "Readiness"
# Check if readiness probe failing kubectl logs new-pod-name -n namespace ```
Fix readiness probe:
# Probe may be failing due to slow startup
spec:
containers:
- name: app
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 60 # Increase if app starts slowly
periodSeconds: 10
failureThreshold: 3Add startup probe for slow applications:
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 10
# Gives 300 seconds (30 * 10) to startSolution 3: Fix Resource Constraints
Insufficient resources block new pods:
```bash # Check pending pods kubectl get pods -n namespace | grep Pending
# Describe pending pod kubectl describe pod pending-pod-name -n namespace
# Check node resources kubectl describe nodes | grep -A 5 "Allocated resources" ```
Reduce resource requests or add capacity:
```yaml # Current requests may be too high resources: requests: cpu: "100m" # Reduce if was higher memory: "128Mi"
# Or scale cluster kubectl scale deployment deployment-name --replicas=3 -n namespace ```
Solution 4: Fix ImagePullBackOff in New Pods
New pods can't pull the updated image:
```bash # Check if pods have image pull errors kubectl get pods -n namespace -l app=deployment-name | grep ImagePullBackOff
# Describe failing pod kubectl describe pod failing-pod-name -n namespace | grep -A 10 Events ```
Fix image configuration:
```yaml # Verify image name and tag spec: containers: - name: app image: registry.company.com/app:v2.0.0 # Correct image
# Add image pull secret if private spec: imagePullSecrets: - name: regcred ```
Solution 5: Fix CrashLoopBackOff in New Pods
New pods crash after starting:
```bash # Check for crashing pods kubectl get pods -n namespace -l app=deployment-name | grep CrashLoopBackOff
# Check crash logs kubectl logs crashing-pod-name -n namespace --previous ```
Debug application crash:
```bash # Check previous logs for error kubectl logs pod-name -n namespace --previous | tail -50
# Run debug version kubectl set image deployment/deployment-name app=debug-image:v1 -n namespace ```
Solution 6: Fix MaxSurge/MaxUnavailable Configuration
Rollout parameters may block progress:
# Check current rollout config
kubectl get deployment deployment-name -n namespace -o jsonpath='{.spec.strategy}'Fix rollout parameters:
```yaml # Current config may be blocking spec: strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # How many extra pods allowed maxUnavailable: 1 # How many can be unavailable
# If maxUnavailable: 0 and maxSurge: 0, rollout blocked # Must have at least one > 0
# For zero downtime (requires extra capacity) spec: strategy: rollingUpdate: maxSurge: 2 # Create 2 new pods before removing old maxUnavailable: 0 # Never go below desired replicas ```
Solution 7: Fix Deployment Paused
Deployment may be manually paused:
```bash # Check if deployment is paused kubectl get deployment deployment-name -n namespace -o jsonpath='{.spec.paused}'
# Resume paused deployment kubectl rollout resume deployment/deployment-name -n namespace ```
Solution 8: Force Rollout Completion
Sometimes you need to force completion:
```bash # Check current rollout history kubectl rollout history deployment/deployment-name -n namespace
# Rollback to previous version if stuck kubectl rollout undo deployment/deployment-name -n namespace
# Or restart deployment fresh kubectl rollout restart deployment/deployment-name -n namespace ```
Delete stuck replica set:
```bash # Find old replica set blocking rollout kubectl get rs -n namespace -l app=deployment-name
# Delete stuck replica set (caution) kubectl delete rs deployment-name-oldrevision -n namespace ```
Verification
After fixing rollout:
```bash # Watch rollout progress kubectl rollout status deployment/deployment-name -n namespace -w
# Verify all pods are new version kubectl get pods -n namespace -l app=deployment-name -o jsonpath='{.items[*].spec.containers[0].image}'
# Check deployment is complete kubectl describe deployment deployment-name -n namespace | grep -A 5 "Replicas"
# Verify pod health kubectl get pods -n namespace -l app=deployment-name ```
Common Rollout Stuck Causes
| Cause | Symptoms | Solution |
|---|---|---|
| Progress deadline exceeded | Deployment condition shows timeout | Increase deadline or fix slow pods |
| Readiness probe failing | Pods not becoming Ready | Fix probe or application |
| Resource constraints | New pods Pending | Reduce requests or add nodes |
| Image pull failure | ImagePullBackOff | Fix image/credentials |
| Application crash | CrashLoopBackOff | Debug application |
| maxUnavailable: 0, maxSurge: 0 | No pods can change | Fix strategy config |
| Paused deployment | .spec.paused: true | Resume rollout |
Prevention Best Practices
Set appropriate progressDeadlineSeconds. Configure proper readiness probes. Test new image before rollout. Use startup probes for slow applications. Monitor rollout progress. Implement rollback automation. Use canary deployments for risky updates.
Rolling updates stuck when new pods don't become ready. Check pod status, diagnose the specific failure (probe, resources, crash), and fix the underlying issue or adjust rollout parameters.