Fix Kubernetes Deployment Rolling Update Stuck

You're updating your deployment with a new image or configuration, but the rollout isn't progressing. The deployment shows partial completion with some new pods and some old pods, but it never finishes. Rolling updates stuck in this state leave your application in an inconsistent state and prevent further updates.

Understanding Rolling Updates

Kubernetes deployments use rolling updates to progressively replace old pods with new ones while maintaining availability. When stuck, it means new pods aren't becoming ready, or the deployment controller is waiting for conditions that never occur. The rollout can hang due to pod failures, insufficient resources, or misconfigured update parameters.

Diagnosis Commands

Check deployment status:

```bash # List deployments kubectl get deployments -n namespace

# Describe deployment kubectl describe deployment deployment-name -n namespace

# Check rollout status kubectl rollout status deployment/deployment-name -n namespace

# Get deployment conditions kubectl get deployment deployment-name -n namespace -o jsonpath='{.status.conditions}' ```

Check replica sets:

```bash # List replica sets for deployment kubectl get rs -n namespace -l app=deployment-name

# Describe current and new replica sets kubectl describe rs deployment-name-xxxx -n namespace ```

Check pods:

```bash # List pods by revision kubectl get pods -n namespace -l app=deployment-name --show-labels

# Check pod status kubectl describe pod pod-name -n namespace

# Check new pod logs kubectl logs pod-name -n namespace --prefix ```

Check events:

```bash # Deployment events kubectl get events -n namespace --field-selector involvedObject.name=deployment-name

# All recent events kubectl get events -n namespace --sort-by='.lastTimestamp' ```

Common Solutions

Solution 1: Fix ProgressDeadlineSecondsExceeded

Deployment has a deadline for rollout completion:

```bash # Check if progress deadline exceeded kubectl describe deployment deployment-name -n namespace | grep -i "ProgressDeadlineExceeded"

# Current progress deadline kubectl get deployment deployment-name -n namespace -o jsonpath='{.spec.progressDeadlineSeconds}' # Default: 600 seconds (10 minutes) ```

If deadline exceeded due to slow pods:

```yaml # Increase progress deadline spec: progressDeadlineSeconds: 1200 # 20 minutes

# If pods take time to start spec: strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 maxSurge: 1 ```

Solution 2: Fix Pod Startup Issues

New pods may not become ready:

```bash # Check new pod status kubectl get pods -n namespace -l app=deployment-name -o wide

# Check pod readiness kubectl describe pod new-pod-name -n namespace | grep -A 5 "Readiness"

# Check if readiness probe failing kubectl logs new-pod-name -n namespace ```

Fix readiness probe:

yaml

# Probe may be failing due to slow startup
spec:
  containers:
  - name: app
    readinessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 60  # Increase if app starts slowly
      periodSeconds: 10
      failureThreshold: 3

Add startup probe for slow applications:

yaml

startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 30
  periodSeconds: 10
# Gives 300 seconds (30 * 10) to start

Solution 3: Fix Resource Constraints

Insufficient resources block new pods:

```bash # Check pending pods kubectl get pods -n namespace | grep Pending

# Describe pending pod kubectl describe pod pending-pod-name -n namespace

# Check node resources kubectl describe nodes | grep -A 5 "Allocated resources" ```

Reduce resource requests or add capacity:

```yaml # Current requests may be too high resources: requests: cpu: "100m" # Reduce if was higher memory: "128Mi"

# Or scale cluster kubectl scale deployment deployment-name --replicas=3 -n namespace ```

Solution 4: Fix ImagePullBackOff in New Pods

New pods can't pull the updated image:

```bash # Check if pods have image pull errors kubectl get pods -n namespace -l app=deployment-name | grep ImagePullBackOff

# Describe failing pod kubectl describe pod failing-pod-name -n namespace | grep -A 10 Events ```

Fix image configuration:

```yaml # Verify image name and tag spec: containers: - name: app image: registry.company.com/app:v2.0.0 # Correct image

# Add image pull secret if private spec: imagePullSecrets: - name: regcred ```

Solution 5: Fix CrashLoopBackOff in New Pods

New pods crash after starting:

```bash # Check for crashing pods kubectl get pods -n namespace -l app=deployment-name | grep CrashLoopBackOff

# Check crash logs kubectl logs crashing-pod-name -n namespace --previous ```

Debug application crash:

```bash # Check previous logs for error kubectl logs pod-name -n namespace --previous | tail -50

# Run debug version kubectl set image deployment/deployment-name app=debug-image:v1 -n namespace ```

Solution 6: Fix MaxSurge/MaxUnavailable Configuration

Rollout parameters may block progress:

bash

# Check current rollout config
kubectl get deployment deployment-name -n namespace -o jsonpath='{.spec.strategy}'

Fix rollout parameters:

```yaml # Current config may be blocking spec: strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # How many extra pods allowed maxUnavailable: 1 # How many can be unavailable

# If maxUnavailable: 0 and maxSurge: 0, rollout blocked # Must have at least one > 0

# For zero downtime (requires extra capacity) spec: strategy: rollingUpdate: maxSurge: 2 # Create 2 new pods before removing old maxUnavailable: 0 # Never go below desired replicas ```

Solution 7: Fix Deployment Paused

Deployment may be manually paused:

```bash # Check if deployment is paused kubectl get deployment deployment-name -n namespace -o jsonpath='{.spec.paused}'

# Resume paused deployment kubectl rollout resume deployment/deployment-name -n namespace ```

Solution 8: Force Rollout Completion

Sometimes you need to force completion:

```bash # Check current rollout history kubectl rollout history deployment/deployment-name -n namespace

# Rollback to previous version if stuck kubectl rollout undo deployment/deployment-name -n namespace

# Or restart deployment fresh kubectl rollout restart deployment/deployment-name -n namespace ```

Delete stuck replica set:

```bash # Find old replica set blocking rollout kubectl get rs -n namespace -l app=deployment-name

# Delete stuck replica set (caution) kubectl delete rs deployment-name-oldrevision -n namespace ```

Verification

After fixing rollout:

```bash # Watch rollout progress kubectl rollout status deployment/deployment-name -n namespace -w

# Verify all pods are new version kubectl get pods -n namespace -l app=deployment-name -o jsonpath='{.items[*].spec.containers[0].image}'

# Check deployment is complete kubectl describe deployment deployment-name -n namespace | grep -A 5 "Replicas"

# Verify pod health kubectl get pods -n namespace -l app=deployment-name ```

Common Rollout Stuck Causes

Cause	Symptoms	Solution
Progress deadline exceeded	Deployment condition shows timeout	Increase deadline or fix slow pods
Readiness probe failing	Pods not becoming Ready	Fix probe or application
Resource constraints	New pods Pending	Reduce requests or add nodes
Image pull failure	ImagePullBackOff	Fix image/credentials
Application crash	CrashLoopBackOff	Debug application
maxUnavailable: 0, maxSurge: 0	No pods can change	Fix strategy config
Paused deployment	`.spec.paused: true`	Resume rollout

Prevention Best Practices

Set appropriate progressDeadlineSeconds. Configure proper readiness probes. Test new image before rollout. Use startup probes for slow applications. Monitor rollout progress. Implement rollback automation. Use canary deployments for risky updates.

Rolling updates stuck when new pods don't become ready. Check pod status, diagnose the specific failure (probe, resources, crash), and fix the underlying issue or adjust rollout parameters.

Understanding Rolling Updates

Diagnosis Commands

Common Solutions

Solution 1: Fix ProgressDeadlineSecondsExceeded

Solution 2: Fix Pod Startup Issues

Solution 3: Fix Resource Constraints

Solution 4: Fix ImagePullBackOff in New Pods

Solution 5: Fix CrashLoopBackOff in New Pods

Solution 6: Fix MaxSurge/MaxUnavailable Configuration

Solution 7: Fix Deployment Paused

Solution 8: Force Rollout Completion

Verification

Common Rollout Stuck Causes

Prevention Best Practices

Share this guide

More Kubernetes Troubleshooting Guides

Kubernetes Service Mesh Proxy Error

Kubernetes CronJob Not Scheduling

Kubernetes Admission Webhook Denied

Kubernetes Job Not Completing

Kubernetes Pod Security Denied

Kubernetes Network Policy Blocking