Fix Kubernetes Deployment Not Rolling Out

You updated your deployment with a new image or configuration, but nothing is happening. The rollout is stuck - no new pods are being created, or they're created but never become ready. Your application remains on the old version, and you're not sure why Kubernetes isn't rolling out the changes.

Understanding Deployment Rollouts

Kubernetes deployments use a rolling update strategy by default, creating new pods with the updated configuration while gradually terminating old pods. The rollout can get stuck for several reasons: the new pods fail to start, they fail health checks, there aren't enough resources, or the deployment configuration prevents progress.

The rollout process involves a ReplicaSet for each revision, and understanding where it's stuck helps identify the fix.

Diagnosis Commands

Start by checking the deployment status:

```bash # Check deployment status kubectl get deployment deployment-name -n namespace

# Get detailed rollout status kubectl rollout status deployment/deployment-name -n namespace

# See rollout history kubectl rollout history deployment/deployment-name -n namespace

# Describe deployment for events kubectl describe deployment deployment-name -n namespace ```

Check the replica sets:

```bash # List all replica sets for the deployment kubectl get rs -n namespace -l app=your-app-label

# Describe the newest replica set kubectl describe rs deployment-name-xxxxx -n namespace ```

Examine the pods:

```bash # Check pod status kubectl get pods -n namespace -l app=your-app-label

# Check pod events and conditions kubectl describe pods -n namespace -l app=your-app-label

# Check pod readiness kubectl get pods -n namespace -l app=your-app-label -o jsonpath='{.items[*].status.conditions[?(@.type=="Ready")].status}' ```

Look at events:

```bash # Get recent events kubectl get events -n namespace --sort-by='.lastTimestamp'

# Filter events for the deployment kubectl get events -n namespace --field-selector involvedObject.name=deployment-name ```

Common Solutions

Solution 1: Fix Image Pull Issues

The most common cause - the new image can't be pulled:

```bash # Check for ImagePullBackOff or ErrImagePull kubectl get pods -n namespace -l app=your-app-label

# Check events for image pull errors kubectl describe pod pod-name -n namespace | grep -A 5 "Events:" ```

Common image issues:

```bash # Wrong image name or tag kubectl set image deployment/deployment-name container-name=correct-image:tag -n namespace

# Fix image pull secrets kubectl create secret docker-registry regcred \ --docker-server=registry.example.com \ --docker-username=user \ --docker-password=password \ -n namespace

# Add to deployment kubectl patch serviceaccount default -n namespace -p '{"imagePullSecrets": [{"name": "regcred"}]}' ```

Verify image exists:

bash

# Test pulling the image
docker pull your-registry/image:tag
crictl pull your-registry/image:tag  # On the node

Solution 2: Fix Resource Constraints

New pods may be pending due to insufficient resources:

```bash # Check if pods are pending kubectl get pods -n namespace -l app=your-app-label | grep Pending

# Describe pending pods kubectl describe pod pending-pod-name -n namespace ```

Fix resource requests:

yaml

# Reduce resource requests if too high
resources:
  requests:
    cpu: "100m"    # Reduce if cluster is constrained
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

Solution 3: Fix Liveness and Readiness Probe Failures

Probes failing on new pods will block the rollout:

```bash # Check probe failures in events kubectl describe pod pod-name -n namespace | grep -A 10 "Liveness|Readiness"

# Check if probes are timing out kubectl logs pod-name -n namespace --previous ```

Increase probe initial delay or timeout:

```yaml livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 60 # Increase if app takes time to start periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3

readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 30 periodSeconds: 5 timeoutSeconds: 3 failureThreshold: 3 ```

Check if the endpoint is actually working:

```bash # Test probe endpoint from inside pod kubectl exec -it pod-name -n namespace -- curl -v http://localhost:8080/health

# Check application logs for errors kubectl logs pod-name -n namespace ```

Solution 4: Fix Deployment Strategy Issues

The deployment strategy might be too aggressive or misconfigured:

bash

# Check current deployment strategy
kubectl get deployment deployment-name -n namespace -o yaml | grep -A 20 strategy

Fix deployment strategy:

yaml

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Maximum pods over desired count during update
      maxUnavailable: 0  # Maximum pods that can be unavailable during update

If you have few replicas, maxUnavailable being too high can cause issues:

yaml

# For small deployments
strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1
    maxUnavailable: 0  # Keep at least desired count available

Solution 5: Fix Progress Deadline

Deployments can timeout if they take too long:

```bash # Check for ProgressDeadlineExceeded kubectl describe deployment deployment-name -n namespace | grep -i progress

# Get deployment conditions kubectl get deployment deployment-name -n namespace -o jsonpath='{.status.conditions}' ```

Increase progress deadline:

yaml

spec:
  progressDeadlineSeconds: 600  # Increase from default 600 if needed

If the deadline was exceeded and you want to retry:

```bash # Check rollout status kubectl rollout status deployment/deployment-name -n namespace

# If stuck, undo and retry kubectl rollout undo deployment/deployment-name -n namespace kubectl rollout restart deployment/deployment-name -n namespace ```

Solution 6: Fix PodDisruptionBudget Blocking

PDB might prevent pods from being terminated:

bash

# Check PDB
kubectl get pdb -n namespace
kubectl describe pdb pdb-name -n namespace

Adjust PDB:

yaml

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 1  # Or use maxUnavailable: 1
  selector:
    matchLabels:
      app: myapp

Solution 7: Fix Horizontal Pod Autoscaler Conflict

HPA might be interfering with the rollout:

bash

# Check HPA status
kubectl get hpa -n namespace
kubectl describe hpa hpa-name -n namespace

Temporarily scale manually:

```bash # Pause HPA during rollout kubectl patch hpa hpa-name -n namespace -p '{"spec":{"minReplicas":3,"maxReplicas":10}}'

# Or delete HPA temporarily kubectl delete hpa hpa-name -n namespace ```

Solution 8: Rollback and Investigate

If the rollout is completely stuck, rollback to recover:

```bash # Check rollout history kubectl rollout history deployment/deployment-name -n namespace

# Rollback to previous revision kubectl rollout undo deployment/deployment-name -n namespace

# Rollback to specific revision kubectl rollout undo deployment/deployment-name -n namespace --to-revision=2

# Pause rollout to investigate kubectl rollout pause deployment/deployment-name -n namespace

# Resume after investigation kubectl rollout resume deployment/deployment-name -n namespace ```

Solution 9: Fix Init Container Failures

Init containers blocking pod startup:

```bash # Check init container status kubectl get pods -n namespace -l app=your-app-label -o jsonpath='{.items[*].status.initContainerStatuses[*].state}'

# Check init container logs kubectl logs pod-name -n namespace -c init-container-name ```

Fix init container issues:

yaml

initContainers:
  - name: init-myservice
    image: busybox
    command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done']
    # Add timeout to prevent infinite wait
    env:
      - name: TIMEOUT
        value: "60"

Verification

After applying fixes:

```bash # Watch rollout progress kubectl rollout status deployment/deployment-name -n namespace -w

# Check deployment status kubectl get deployment deployment-name -n namespace

# Verify all pods are ready kubectl get pods -n namespace -l app=your-app-label

# Check events for successful rollout kubectl get events -n namespace --sort-by='.lastTimestamp' ```

Rollout Debugging Checklist

Check	Command	Issue
Pod status	`kubectl get pods -l app=X`	ImagePull, Pending
Pod events	`kubectl describe pod X`	Scheduling, probes
ReplicaSets	`kubectl get rs -l app=X`	Scaling issues
Deployment events	`kubectl describe deploy X`	Strategy, timeout
Resources	`kubectl describe nodes`	Node capacity
HPA	`kubectl get hpa`	Auto-scaling conflict
PDB	`kubectl get pdb`	Eviction blocking

Prevention Best Practices

Use proper health checks (liveness and readiness probes) with appropriate initial delays. Set realistic resource requests based on actual usage. Configure maxUnavailable: 0 for critical services during rollouts. Use progressDeadlineSeconds appropriate for your application startup time. Monitor rollout progress with kubectl rollout status. Test new images locally or in staging before production.

Deployment rollouts getting stuck is usually about the new pods not being able to serve traffic - whether from image issues, resource constraints, or failed health checks. The key is checking the ReplicaSet and pod events to understand where the process is blocked.

Understanding Deployment Rollouts

Diagnosis Commands

Common Solutions

Solution 1: Fix Image Pull Issues

Solution 2: Fix Resource Constraints

Solution 3: Fix Liveness and Readiness Probe Failures

Solution 4: Fix Deployment Strategy Issues

Solution 5: Fix Progress Deadline

Solution 6: Fix PodDisruptionBudget Blocking

Solution 7: Fix Horizontal Pod Autoscaler Conflict

Solution 8: Rollback and Investigate

Solution 9: Fix Init Container Failures

Verification

Rollout Debugging Checklist

Prevention Best Practices

Share this guide

More Kubernetes Troubleshooting Guides

Kubernetes Service Mesh Proxy Error

Kubernetes CronJob Not Scheduling

Kubernetes Admission Webhook Denied

Kubernetes Pod Security Denied

Kubernetes Network Policy Blocking

Kubernetes Weave Net Cross-Host Connectivity