Introduction

When a Helm upgrade fails partway through, Helm attempts to automatically roll back to the previous revision. However, if the previous release's Kubernetes secrets have been deleted, corrupted, or the release name was mistyped, the rollback fails with release not found. This leaves the cluster in a partially upgraded state with no automatic recovery path.

Symptoms

  • helm upgrade fails with UPGRADE FAILED followed by ROLLBACK FAILED: release not found
  • helm list shows the release in a failed or pending-upgrade state
  • Kubernetes resources are partially updated -- some at the new version, some at the old
  • helm history shows the failed upgrade revision but no valid rollback target
  • Error message: Error: UPGRADE FAILED: release: not found

Common Causes

  • Helm release secret deleted manually or by automated cleanup scripts
  • Previous release never successfully installed (first install failed)
  • Helm storage backend (secrets, configmaps) corrupted during cluster migration
  • Release name contains typo, referencing a non-existent release
  • Tiller/Helm 3 storage backend migration leaving orphaned release data

Step-by-Step Fix

  1. 1.Check the current release state: Identify the release and its revision history.
  2. 2.```bash
  3. 3.helm list --all --all-namespaces | grep my-release
  4. 4.helm history my-release -n my-namespace
  5. 5.`
  6. 6.Find the last successful revision: Look for the last deployed status.
  7. 7.```bash
  8. 8.helm history my-release -n my-namespace | grep deployed | tail -1
  9. 9.`
  10. 10.Rollback to the last known good revision: Specify the revision number explicitly.
  11. 11.```bash
  12. 12.helm rollback my-release 5 -n my-namespace
  13. 13.# Where 5 is the last successful revision number
  14. 14.`
  15. 15.If rollback fails, manually fix the release state: Use helm to fix the release status.
  16. 16.```bash
  17. 17.helm rollback my-release 5 -n my-namespace --force
  18. 18.# Or fix the release history
  19. 19.kubectl get secrets -n my-namespace -l name=my-release,owner=helm
  20. 20.`
  21. 21.Re-deploy from scratch if recovery is impossible: Clean up and reinstall.
  22. 22.```bash
  23. 23.helm uninstall my-release -n my-namespace --no-hooks
  24. 24.helm install my-release ./chart -n my-namespace --atomic --timeout 5m
  25. 25.`

Prevention

  • Always use --atomic flag with helm upgrade to ensure automatic rollback on failure
  • Keep Helm release secrets from being deleted by excluding them from automated cleanup
  • Use --timeout to prevent upgrades from hanging indefinitely in pending state
  • Test Helm upgrades in a staging namespace before production deployment
  • Monitor Helm release status after each upgrade with automated health checks
  • Back up Helm release secrets regularly for disaster recovery scenarios