Introduction

When a site goes down right after a deploy, the deployment is the strongest lead you have. The goal is not to stare at the whole stack at once. It is to compare the new release against the last healthy state and find which startup, config, asset, or dependency change broke traffic.

Symptoms

  • The outage starts immediately after deployment or restart
  • Health checks fail even though the build or image completed successfully
  • Static assets, app routes, or admin pages break at the same time
  • Logs show startup errors, missing config, or migration failures
  • Rolling back the release appears likely to restore service quickly

Common Causes

  • Required environment variables or secrets are missing in the new runtime
  • Database migrations were incompatible, incomplete, or applied out of order
  • The application starts with the wrong build artifact, asset manifest, or static file set
  • New code crashes during boot under production configuration
  • The deploy changed ports, health checks, or reverse proxy expectations

Step-by-Step Fix

  1. Freeze further deploy activity until you identify whether the new release or the environment around it is causing the outage.
  2. Compare health checks, startup logs, and crash output between the new release and the last known good version.
  3. Verify runtime configuration, secrets, ports, and environment variables match what the new build expects.
  4. Check migrations, schema state, and startup hooks for failures that block the app from becoming ready.
  5. Confirm static assets, templates, and build artifacts were deployed together and not mixed between old and new releases.
  6. If rollback is safer and fast, move back to the last healthy release while preserving logs and change context for root-cause work.
  7. Patch the release only after you can reproduce or clearly explain the broken startup path.
  8. Re-test the critical public routes and admin flows before calling the incident resolved.
  9. After recovery, record the exact failure mode so future deploy checks can catch it earlier.