You're updating your deployment with a new image or configuration, but the rollout isn't progressing. The deployment shows partial completion with some new pods and some old pods, but it never finishes. Rolling updates stuck in this state leave your application in an inconsistent state and prevent further updates.

Understanding Rolling Updates

Kubernetes deployments use rolling updates to progressively replace old pods with new ones while maintaining availability. When stuck, it means new pods aren't becoming ready, or the deployment controller is waiting for conditions that never occur. The rollout can hang due to pod failures, insufficient resources, or misconfigured update parameters.

Diagnosis Commands

Check deployment status:

```bash # List deployments kubectl get deployments -n namespace

# Describe deployment kubectl describe deployment deployment-name -n namespace

# Check rollout status kubectl rollout status deployment/deployment-name -n namespace

# Get deployment conditions kubectl get deployment deployment-name -n namespace -o jsonpath='{.status.conditions}' ```

Check replica sets:

```bash # List replica sets for deployment kubectl get rs -n namespace -l app=deployment-name

# Describe current and new replica sets kubectl describe rs deployment-name-xxxx -n namespace ```

Check pods:

```bash # List pods by revision kubectl get pods -n namespace -l app=deployment-name --show-labels

# Check pod status kubectl describe pod pod-name -n namespace

# Check new pod logs kubectl logs pod-name -n namespace --prefix ```

Check events:

```bash # Deployment events kubectl get events -n namespace --field-selector involvedObject.name=deployment-name

# All recent events kubectl get events -n namespace --sort-by='.lastTimestamp' ```

Common Solutions

Solution 1: Fix ProgressDeadlineSecondsExceeded

Deployment has a deadline for rollout completion:

```bash # Check if progress deadline exceeded kubectl describe deployment deployment-name -n namespace | grep -i "ProgressDeadlineExceeded"

# Current progress deadline kubectl get deployment deployment-name -n namespace -o jsonpath='{.spec.progressDeadlineSeconds}' # Default: 600 seconds (10 minutes) ```

If deadline exceeded due to slow pods:

```yaml # Increase progress deadline spec: progressDeadlineSeconds: 1200 # 20 minutes

# If pods take time to start spec: strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 maxSurge: 1 ```

Solution 2: Fix Pod Startup Issues

New pods may not become ready:

```bash # Check new pod status kubectl get pods -n namespace -l app=deployment-name -o wide

# Check pod readiness kubectl describe pod new-pod-name -n namespace | grep -A 5 "Readiness"

# Check if readiness probe failing kubectl logs new-pod-name -n namespace ```

Fix readiness probe:

yaml
# Probe may be failing due to slow startup
spec:
  containers:
  - name: app
    readinessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 60  # Increase if app starts slowly
      periodSeconds: 10
      failureThreshold: 3

Add startup probe for slow applications:

yaml
startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 30
  periodSeconds: 10
# Gives 300 seconds (30 * 10) to start

Solution 3: Fix Resource Constraints

Insufficient resources block new pods:

```bash # Check pending pods kubectl get pods -n namespace | grep Pending

# Describe pending pod kubectl describe pod pending-pod-name -n namespace

# Check node resources kubectl describe nodes | grep -A 5 "Allocated resources" ```

Reduce resource requests or add capacity:

```yaml # Current requests may be too high resources: requests: cpu: "100m" # Reduce if was higher memory: "128Mi"

# Or scale cluster kubectl scale deployment deployment-name --replicas=3 -n namespace ```

Solution 4: Fix ImagePullBackOff in New Pods

New pods can't pull the updated image:

```bash # Check if pods have image pull errors kubectl get pods -n namespace -l app=deployment-name | grep ImagePullBackOff

# Describe failing pod kubectl describe pod failing-pod-name -n namespace | grep -A 10 Events ```

Fix image configuration:

```yaml # Verify image name and tag spec: containers: - name: app image: registry.company.com/app:v2.0.0 # Correct image

# Add image pull secret if private spec: imagePullSecrets: - name: regcred ```

Solution 5: Fix CrashLoopBackOff in New Pods

New pods crash after starting:

```bash # Check for crashing pods kubectl get pods -n namespace -l app=deployment-name | grep CrashLoopBackOff

# Check crash logs kubectl logs crashing-pod-name -n namespace --previous ```

Debug application crash:

```bash # Check previous logs for error kubectl logs pod-name -n namespace --previous | tail -50

# Run debug version kubectl set image deployment/deployment-name app=debug-image:v1 -n namespace ```

Solution 6: Fix MaxSurge/MaxUnavailable Configuration

Rollout parameters may block progress:

bash
# Check current rollout config
kubectl get deployment deployment-name -n namespace -o jsonpath='{.spec.strategy}'

Fix rollout parameters:

```yaml # Current config may be blocking spec: strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # How many extra pods allowed maxUnavailable: 1 # How many can be unavailable

# If maxUnavailable: 0 and maxSurge: 0, rollout blocked # Must have at least one > 0

# For zero downtime (requires extra capacity) spec: strategy: rollingUpdate: maxSurge: 2 # Create 2 new pods before removing old maxUnavailable: 0 # Never go below desired replicas ```

Solution 7: Fix Deployment Paused

Deployment may be manually paused:

```bash # Check if deployment is paused kubectl get deployment deployment-name -n namespace -o jsonpath='{.spec.paused}'

# Resume paused deployment kubectl rollout resume deployment/deployment-name -n namespace ```

Solution 8: Force Rollout Completion

Sometimes you need to force completion:

```bash # Check current rollout history kubectl rollout history deployment/deployment-name -n namespace

# Rollback to previous version if stuck kubectl rollout undo deployment/deployment-name -n namespace

# Or restart deployment fresh kubectl rollout restart deployment/deployment-name -n namespace ```

Delete stuck replica set:

```bash # Find old replica set blocking rollout kubectl get rs -n namespace -l app=deployment-name

# Delete stuck replica set (caution) kubectl delete rs deployment-name-oldrevision -n namespace ```

Verification

After fixing rollout:

```bash # Watch rollout progress kubectl rollout status deployment/deployment-name -n namespace -w

# Verify all pods are new version kubectl get pods -n namespace -l app=deployment-name -o jsonpath='{.items[*].spec.containers[0].image}'

# Check deployment is complete kubectl describe deployment deployment-name -n namespace | grep -A 5 "Replicas"

# Verify pod health kubectl get pods -n namespace -l app=deployment-name ```

Common Rollout Stuck Causes

CauseSymptomsSolution
Progress deadline exceededDeployment condition shows timeoutIncrease deadline or fix slow pods
Readiness probe failingPods not becoming ReadyFix probe or application
Resource constraintsNew pods PendingReduce requests or add nodes
Image pull failureImagePullBackOffFix image/credentials
Application crashCrashLoopBackOffDebug application
maxUnavailable: 0, maxSurge: 0No pods can changeFix strategy config
Paused deployment.spec.paused: trueResume rollout

Prevention Best Practices

Set appropriate progressDeadlineSeconds. Configure proper readiness probes. Test new image before rollout. Use startup probes for slow applications. Monitor rollout progress. Implement rollback automation. Use canary deployments for risky updates.

Rolling updates stuck when new pods don't become ready. Check pod status, diagnose the specific failure (probe, resources, crash), and fix the underlying issue or adjust rollout parameters.