Fix Kubernetes StatefulSet Stuck - Update Troubleshooting Guide

Your StatefulSet update is stuck. Some pods have the new version, others are still on the old version, and the rollout isn progressing. StatefulSets manage ordered, stable pods with persistent identities, but this ordering can cause updates to stall when a single pod fails to update.

Understanding StatefulSet Updates

StatefulSets update pods in order (or reverse order), one at a time, waiting for each pod to become ready before updating the next. This ordered approach ensures stability for stateful applications, but it means a single failed pod can block the entire rollout.

The update can be stuck because: a pod failed to start with the new version, the partition setting is blocking updates, readiness probes are failing, or resources are insufficient.

Diagnosis Commands

Start by checking StatefulSet status:

```bash # Check StatefulSet status kubectl get statefulset statefulset-name -n namespace

# Get detailed status kubectl describe statefulset statefulset-name -n namespace

# Check update revision kubectl get statefulset statefulset-name -n namespace -o jsonpath='{.status.updateRevision}' kubectl get statefulset statefulset-name -n namespace -o jsonpath='{.status.currentRevision}' ```

Check pod status:

```bash # Check all pods and their versions kubectl get pods -n namespace -l app=statefulset-label -o wide

# Check which pods are updated kubectl get pods -n namespace -l app=statefulset-label -o jsonpath='{.items[*].metadata.labels.controller-revision-hash}'

# Describe the stuck pod kubectl describe pod statefulset-name-index -n namespace ```

Check update strategy:

```bash # Check update strategy kubectl get statefulset statefulset-name -n namespace -o yaml | grep -A 15 updateStrategy

# Check partition setting kubectl get statefulset statefulset-name -n namespace -o jsonpath='{.spec.updateStrategy.rollingUpdate.partition}' ```

Common Solutions

Solution 1: Fix Pod Failing to Update

The pod at the current update position might be failing:

```bash # Identify which pod is blocking the update kubectl describe statefulset statefulset-name -n namespace | grep -A 5 "Pods"

# Check the failing pod kubectl describe pod statefulset-name-2 -n namespace # For example, pod index 2

# Check pod logs kubectl logs statefulset-name-2 -n namespace

# Check events kubectl get events -n namespace --field-selector involvedObject.name=statefulset-name-2 ```

Fix the pod failure:

```yaml # Check if image is correct kubectl get statefulset statefulset-name -n namespace -o jsonpath='{.spec.template.spec.containers[*].image}'

# If image is wrong, update it kubectl set image statefulset/statefulset-name container-name=correct-image:tag -n namespace ```

If a pod is stuck in CrashLoopBackOff:

```bash # Check pod restart count kubectl get pod statefulset-name-2 -n namespace -o jsonpath='{.status.containerStatuses[*].restartCount}'

# Check previous logs kubectl logs statefulset-name-2 -n namespace --previous

# Delete the pod to force recreation kubectl delete pod statefulset-name-2 -n namespace ```

Solution 2: Fix Partition Blocking Updates

The partition setting limits which pods get updated:

```bash # Check partition value kubectl get statefulset statefulset-name -n namespace -o jsonpath='{.spec.updateStrategy.rollingUpdate.partition}' # If partition is N, pods with index >= N won be updated

# Example: partition = 2 means pods statefulset-0 and statefulset-1 won update ```

Remove or reduce partition:

```bash # Remove partition to update all pods kubectl patch statefulset statefulset-name -n namespace -p '{"spec":{"updateStrategy":{"rollingUpdate":{"partition":0}}}}'

# Or edit directly kubectl edit statefulset statefulset-name -n namespace # Set spec.updateStrategy.rollingUpdate.partition to 0 ```

Gradual rollout using partition:

yaml

# Gradual update: start with partition = replicas
# Then reduce to update pods one by one
spec:
  replicas: 5
  updateStrategy:
    rollingUpdate:
      partition: 5  # Start: no pods update
      # Then: partition: 4 -> pod-4 updates
      # Then: partition: 3 -> pod-3 updates
      # Continue until partition: 0 -> all pods updated

Solution 3: Fix Readiness Probe Failures

Pods must be ready before the next pod updates:

```bash # Check pod readiness kubectl get pods -n namespace -l app=statefulset-label

# Check probe failures kubectl describe pod statefulset-name-index -n namespace | grep -A 10 Readiness ```

Fix readiness probe:

yaml

spec:
  template:
    spec:
      containers:
      - name: app
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30  # Increase if app takes time
          periodSeconds: 10
          failureThreshold: 3

Solution 4: Fix Resource Constraints

Pods might be pending due to resource issues:

```bash # Check for pending pods kubectl get pods -n namespace -l app=statefulset-label | grep Pending

# Describe pending pod kubectl describe pod pending-pod -n namespace

# Check PVC status (StatefulSets use PVCs) kubectl get pvc -n namespace ```

Fix resource requests:

yaml

spec:
  template:
    spec:
      containers:
      - name: app
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

Solution 5: Fix PVC Issues

StatefulSet pods use persistent volumes:

```bash # Check PVC binding kubectl get pvc -n namespace

# Describe PVC for issues kubectl describe pvc pvc-name -n namespace

# Check storage class kubectl get storageclass ```

Fix storage issues:

```bash # If PVC is stuck pending kubectl describe pvc pvc-name -n namespace | grep -A 10 Events

# Create missing storage class if needed kubectl apply -f storageclass.yaml ```

Solution 6: Fix OnDelete Update Strategy

If update strategy is OnDelete, pods only update when manually deleted:

bash

# Check update strategy
kubectl get statefulset statefulset-name -n namespace -o jsonpath='{.spec.updateStrategy.type}'

Switch to RollingUpdate or delete pods manually:

```bash # Switch to rolling update kubectl patch statefulset statefulset-name -n namespace -p '{"spec":{"updateStrategy":{"type":"RollingUpdate"}}}'

# Or manually delete pods for OnDelete strategy kubectl delete pod statefulset-name-0 -n namespace kubectl delete pod statefulset-name-1 -n namespace # Continue for all pods ```

Solution 7: Force Update by Deleting Pods

Sometimes you need to force pod recreation:

```bash # Delete stuck pod - StatefulSet will recreate it with new spec kubectl delete pod statefulset-name-index -n namespace

# Watch pod recreation kubectl get pods -n namespace -l app=statefulset-label -w ```

Solution 8: Scale Down and Up

For major changes, scale to zero and back:

```bash # Scale to zero kubectl scale statefulset statefulset-name -n namespace --replicas=0

# Wait for pods to terminate kubectl get pods -n namespace -l app=statefulset-label -w

# Scale back up kubectl scale statefulset statefulset-name -n namespace --replicas=3 ```

Note: This removes pod identity and may cause issues for some applications.

Solution 9: Check Pod Management Policy

Parallel management allows simultaneous updates:

yaml

spec:
  podManagementPolicy: Parallel  # Pods created/updated in parallel, not ordered
  # Default is OrderedReady (sequential)

Solution 10: Fix Headless Service Issues

StatefulSets require a headless service:

```bash # Check if headless service exists kubectl get service -n namespace

# Verify service is headless (clusterIP: None) kubectl get service service-name -n namespace -o jsonpath='{.spec.clusterIP}' ```

Create headless service:

yaml

apiVersion: v1
kind: Service
metadata:
  name: statefulset-service
spec:
  clusterIP: None  # Headless
  selector:
    app: statefulset-app
  ports:
  - port: 8080

Verification

After fixing the issue:

```bash # Check StatefulSet update status kubectl get statefulset statefulset-name -n namespace

# Check all pods are on new revision kubectl get pods -n namespace -l app=statefulset-label -o jsonpath='{.items[*].metadata.labels.controller-revision-hash}'

# Monitor rollout kubectl rollout status statefulset/statefulset-name -n namespace

# Verify pods are ready kubectl get pods -n namespace -l app=statefulset-label ```

StatefulSet Update Debugging

```bash # Comprehensive check echo "=== StatefulSet Status ===" kubectl get statefulset statefulset-name -n namespace -o yaml | grep -A 10 status

echo "=== Pod Status ===" kubectl get pods -n namespace -l app=statefulset-label -o wide

echo "=== Update Strategy ===" kubectl get statefulset statefulset-name -n namespace -o yaml | grep -A 15 updateStrategy

echo "=== Current vs Update Revision ===" kubectl get statefulset statefulset-name -n namespace -o jsonpath='{.status.currentRevision}' echo "" kubectl get statefulset statefulset-name -n namespace -o jsonpath='{.status.updateRevision}' ```

StatefulSet Stuck Causes Summary

Cause	Check	Solution
Pod failing to start	`kubectl describe pod`	Fix image, config, or delete pod
Partition blocking	`kubectl get sts -o yaml`	Set partition to 0
Readiness probe failing	Pod not ready	Fix probe or increase delay
PVC issues	`kubectl get pvc`	Fix storage class or PVC
OnDelete strategy	`kubectl get sts -o yaml`	Change to RollingUpdate
Resource constraints	Pod pending	Fix resource requests
Headless service missing	`kubectl get svc`	Create headless service

Prevention Best Practices

Use appropriate update strategy for your application. Set proper readiness probes with realistic delays. Use partition for controlled canary rollouts. Ensure headless service exists before creating StatefulSet. Test updates with small partitions first. Monitor pod readiness during updates. Have rollback plan ready before updates.

StatefulSet updates getting stuck is usually about one pod failing to become ready - the ordered nature means everything stops at that point. Check the pod at the lowest index that hasn't been updated, as that's where the problem lies.

Fix Kubernetes StatefulSet Stuck Not Updating

Understanding StatefulSet Updates

Diagnosis Commands

Common Solutions

Solution 1: Fix Pod Failing to Update

Solution 2: Fix Partition Blocking Updates

Solution 3: Fix Readiness Probe Failures

Solution 4: Fix Resource Constraints

Solution 5: Fix PVC Issues

Solution 6: Fix OnDelete Update Strategy

Solution 7: Force Update by Deleting Pods

Solution 8: Scale Down and Up

Solution 9: Check Pod Management Policy

Solution 10: Fix Headless Service Issues

Verification

StatefulSet Update Debugging

StatefulSet Stuck Causes Summary

Prevention Best Practices

Share this guide

More Kubernetes Troubleshooting Guides

Kubernetes Service Mesh Proxy Error

Kubernetes CronJob Not Scheduling

Kubernetes Admission Webhook Denied

Kubernetes Pod Security Denied

Kubernetes Network Policy Blocking

Kubernetes Weave Net Cross-Host Connectivity