Introduction

Helm hooks allow chart authors to run actions (like database migrations or secret creation) at specific points during a release lifecycle. A pre-install hook should complete successfully before the main resources are deployed. However, when the hook job fails silently -- due to missing error handling, incorrect hook weight, or the helm.sh/hook-delete-policy removing the pod before inspection -- the main deployment proceeds without its prerequisites being met.

Symptoms

  • Helm install succeeds but the application is non-functional due to missing setup
  • Hook job pod completed with non-zero exit code but was automatically deleted
  • Database migrations did not run before the application started
  • kubectl get jobs shows no trace of the hook job after deletion
  • Error message: job failed: BackoffLimitExceeded (visible only briefly before deletion)

Common Causes

  • Hook helm.sh/hook-delete-policy set to before-hook-creation or hook-succeeded, deleting the pod before failure can be inspected
  • Hook job backoffLimit set too low, failing before the first attempt completes
  • Hook missing hook-weight annotation, causing execution order issues
  • Container image in hook job not available or pulling fails silently
  • Hook job namespace mismatch with the main release

Step-by-Step Fix

  1. 1.Check for failed hook jobs: Look for hook job history before it is deleted.
  2. 2.```bash
  3. 3.kubectl get jobs -n my-namespace -l "app.kubernetes.io/managed-by=Helm"
  4. 4.kubectl logs -l job-name=my-pre-install-hook -n my-namespace --previous
  5. 5.`
  6. 6.Modify the hook delete policy to retain failed pods: Keep failed pods for inspection.
  7. 7.```yaml
  8. 8.# templates/db-migration-job.yaml
  9. 9.apiVersion: batch/v1
  10. 10.kind: Job
  11. 11.metadata:
  12. 12.name: "{{ .Release.Name }}-db-migration"
  13. 13.annotations:
  14. 14."helm.sh/hook": pre-install,pre-upgrade
  15. 15."helm.sh/hook-delete-policy": hook-failed # Keep failed pods
  16. 16."helm.sh/hook-weight": "-5" # Run before other hooks
  17. 17.`
  18. 18.Add proper error handling to the hook job: Ensure failures are visible.
  19. 19.```yaml
  20. 20.spec:
  21. 21.backoffLimit: 3
  22. 22.template:
  23. 23.spec:
  24. 24.containers:
  25. 25.- name: migration
  26. 26.image: myapp:latest
  27. 27.command: ["sh", "-c", "run-migrations.sh && echo 'Migrations completed' || { echo 'Migrations failed'; exit 1; }"]
  28. 28.restartPolicy: Never
  29. 29.activeDeadlineSeconds: 300
  30. 30.`
  31. 31.Re-run the hook manually if the deployment already proceeded: Execute the prerequisite step.
  32. 32.```bash
  33. 33.kubectl create job manual-migration --image=myapp:latest -- run-migrations.sh -n my-namespace
  34. 34.kubectl logs job/manual-migration -f
  35. 35.`
  36. 36.Verify the prerequisite is satisfied before the application starts: Confirm the hook's purpose is met.
  37. 37.```bash
  38. 38.kubectl exec -it my-app-pod -- python manage.py check --deploy
  39. 39.`

Prevention

  • Set helm.sh/hook-delete-policy: hook-failed to retain failed hook pods for debugging
  • Use helm.sh/hook-weight to control hook execution order explicitly
  • Set activeDeadlineSeconds on hook jobs to prevent indefinite hanging
  • Add --wait and --timeout flags to helm install to wait for hooks to complete
  • Test hook jobs independently of the Helm chart before integration
  • Monitor hook job completion status in CI/CD pipelines and fail the pipeline on hook errors