You updated your deployment with a new image or configuration, but nothing is happening. The rollout is stuck - no new pods are being created, or they're created but never become ready. Your application remains on the old version, and you're not sure why Kubernetes isn't rolling out the changes.
Understanding Deployment Rollouts
Kubernetes deployments use a rolling update strategy by default, creating new pods with the updated configuration while gradually terminating old pods. The rollout can get stuck for several reasons: the new pods fail to start, they fail health checks, there aren't enough resources, or the deployment configuration prevents progress.
The rollout process involves a ReplicaSet for each revision, and understanding where it's stuck helps identify the fix.
Diagnosis Commands
Start by checking the deployment status:
```bash # Check deployment status kubectl get deployment deployment-name -n namespace
# Get detailed rollout status kubectl rollout status deployment/deployment-name -n namespace
# See rollout history kubectl rollout history deployment/deployment-name -n namespace
# Describe deployment for events kubectl describe deployment deployment-name -n namespace ```
Check the replica sets:
```bash # List all replica sets for the deployment kubectl get rs -n namespace -l app=your-app-label
# Describe the newest replica set kubectl describe rs deployment-name-xxxxx -n namespace ```
Examine the pods:
```bash # Check pod status kubectl get pods -n namespace -l app=your-app-label
# Check pod events and conditions kubectl describe pods -n namespace -l app=your-app-label
# Check pod readiness kubectl get pods -n namespace -l app=your-app-label -o jsonpath='{.items[*].status.conditions[?(@.type=="Ready")].status}' ```
Look at events:
```bash # Get recent events kubectl get events -n namespace --sort-by='.lastTimestamp'
# Filter events for the deployment kubectl get events -n namespace --field-selector involvedObject.name=deployment-name ```
Common Solutions
Solution 1: Fix Image Pull Issues
The most common cause - the new image can't be pulled:
```bash # Check for ImagePullBackOff or ErrImagePull kubectl get pods -n namespace -l app=your-app-label
# Check events for image pull errors kubectl describe pod pod-name -n namespace | grep -A 5 "Events:" ```
Common image issues:
```bash # Wrong image name or tag kubectl set image deployment/deployment-name container-name=correct-image:tag -n namespace
# Fix image pull secrets kubectl create secret docker-registry regcred \ --docker-server=registry.example.com \ --docker-username=user \ --docker-password=password \ -n namespace
# Add to deployment kubectl patch serviceaccount default -n namespace -p '{"imagePullSecrets": [{"name": "regcred"}]}' ```
Verify image exists:
# Test pulling the image
docker pull your-registry/image:tag
crictl pull your-registry/image:tag # On the nodeSolution 2: Fix Resource Constraints
New pods may be pending due to insufficient resources:
```bash # Check if pods are pending kubectl get pods -n namespace -l app=your-app-label | grep Pending
# Describe pending pods kubectl describe pod pending-pod-name -n namespace ```
Fix resource requests:
# Reduce resource requests if too high
resources:
requests:
cpu: "100m" # Reduce if cluster is constrained
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"Solution 3: Fix Liveness and Readiness Probe Failures
Probes failing on new pods will block the rollout:
```bash # Check probe failures in events kubectl describe pod pod-name -n namespace | grep -A 10 "Liveness|Readiness"
# Check if probes are timing out kubectl logs pod-name -n namespace --previous ```
Increase probe initial delay or timeout:
```yaml livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 60 # Increase if app takes time to start periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3
readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 30 periodSeconds: 5 timeoutSeconds: 3 failureThreshold: 3 ```
Check if the endpoint is actually working:
```bash # Test probe endpoint from inside pod kubectl exec -it pod-name -n namespace -- curl -v http://localhost:8080/health
# Check application logs for errors kubectl logs pod-name -n namespace ```
Solution 4: Fix Deployment Strategy Issues
The deployment strategy might be too aggressive or misconfigured:
# Check current deployment strategy
kubectl get deployment deployment-name -n namespace -o yaml | grep -A 20 strategyFix deployment strategy:
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Maximum pods over desired count during update
maxUnavailable: 0 # Maximum pods that can be unavailable during updateIf you have few replicas, maxUnavailable being too high can cause issues:
# For small deployments
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # Keep at least desired count availableSolution 5: Fix Progress Deadline
Deployments can timeout if they take too long:
```bash # Check for ProgressDeadlineExceeded kubectl describe deployment deployment-name -n namespace | grep -i progress
# Get deployment conditions kubectl get deployment deployment-name -n namespace -o jsonpath='{.status.conditions}' ```
Increase progress deadline:
spec:
progressDeadlineSeconds: 600 # Increase from default 600 if neededIf the deadline was exceeded and you want to retry:
```bash # Check rollout status kubectl rollout status deployment/deployment-name -n namespace
# If stuck, undo and retry kubectl rollout undo deployment/deployment-name -n namespace kubectl rollout restart deployment/deployment-name -n namespace ```
Solution 6: Fix PodDisruptionBudget Blocking
PDB might prevent pods from being terminated:
# Check PDB
kubectl get pdb -n namespace
kubectl describe pdb pdb-name -n namespaceAdjust PDB:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: myapp-pdb
spec:
minAvailable: 1 # Or use maxUnavailable: 1
selector:
matchLabels:
app: myappSolution 7: Fix Horizontal Pod Autoscaler Conflict
HPA might be interfering with the rollout:
# Check HPA status
kubectl get hpa -n namespace
kubectl describe hpa hpa-name -n namespaceTemporarily scale manually:
```bash # Pause HPA during rollout kubectl patch hpa hpa-name -n namespace -p '{"spec":{"minReplicas":3,"maxReplicas":10}}'
# Or delete HPA temporarily kubectl delete hpa hpa-name -n namespace ```
Solution 8: Rollback and Investigate
If the rollout is completely stuck, rollback to recover:
```bash # Check rollout history kubectl rollout history deployment/deployment-name -n namespace
# Rollback to previous revision kubectl rollout undo deployment/deployment-name -n namespace
# Rollback to specific revision kubectl rollout undo deployment/deployment-name -n namespace --to-revision=2
# Pause rollout to investigate kubectl rollout pause deployment/deployment-name -n namespace
# Resume after investigation kubectl rollout resume deployment/deployment-name -n namespace ```
Solution 9: Fix Init Container Failures
Init containers blocking pod startup:
```bash # Check init container status kubectl get pods -n namespace -l app=your-app-label -o jsonpath='{.items[*].status.initContainerStatuses[*].state}'
# Check init container logs kubectl logs pod-name -n namespace -c init-container-name ```
Fix init container issues:
initContainers:
- name: init-myservice
image: busybox
command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done']
# Add timeout to prevent infinite wait
env:
- name: TIMEOUT
value: "60"Verification
After applying fixes:
```bash # Watch rollout progress kubectl rollout status deployment/deployment-name -n namespace -w
# Check deployment status kubectl get deployment deployment-name -n namespace
# Verify all pods are ready kubectl get pods -n namespace -l app=your-app-label
# Check events for successful rollout kubectl get events -n namespace --sort-by='.lastTimestamp' ```
Rollout Debugging Checklist
| Check | Command | Issue |
|---|---|---|
| Pod status | kubectl get pods -l app=X | ImagePull, Pending |
| Pod events | kubectl describe pod X | Scheduling, probes |
| ReplicaSets | kubectl get rs -l app=X | Scaling issues |
| Deployment events | kubectl describe deploy X | Strategy, timeout |
| Resources | kubectl describe nodes | Node capacity |
| HPA | kubectl get hpa | Auto-scaling conflict |
| PDB | kubectl get pdb | Eviction blocking |
Prevention Best Practices
Use proper health checks (liveness and readiness probes) with appropriate initial delays. Set realistic resource requests based on actual usage. Configure maxUnavailable: 0 for critical services during rollouts. Use progressDeadlineSeconds appropriate for your application startup time. Monitor rollout progress with kubectl rollout status. Test new images locally or in staging before production.
Deployment rollouts getting stuck is usually about the new pods not being able to serve traffic - whether from image issues, resource constraints, or failed health checks. The key is checking the ReplicaSet and pod events to understand where the process is blocked.