Introduction

GitHub Actions jobs usually do not fail because of explicit errors. They fail because a process never exits. A test runner waits forever, a package manager prompts for input, or a deployment script blocks on a network call until timeout-minutes kills the job. The goal is to identify what is hanging and add both process-level and workflow-level time limits.

Symptoms

  • A job sits on one step for a long time and then ends with a timeout
  • Logs stop updating before the job fails
  • The same script works locally but hangs in CI
  • Builds consume large amounts of runner time without producing errors

Common Causes

  • A script is waiting for interactive input that never arrives in CI
  • Tests or browsers deadlock and never return control to the shell
  • Network calls or dependency installs hang without an application-level timeout
  • Background processes stay alive and keep the step from exiting

Step-by-Step Fix

  1. 1.Set an explicit job timeout
  2. 2.Every long-running job should have a clear upper bound even if the underlying tools are well behaved most of the time.
yaml
jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 30
  1. 1.Add step-level or tool-level timeouts to the riskiest commands
  2. 2.A job-level timeout is the last line of defense. The first line should be command-specific limits so you can identify the real hung step quickly.
yaml
- name: Run end-to-end tests
  timeout-minutes: 15
  run: npm run test:e2e
  1. 1.Make CI commands non-interactive
  2. 2.Prompts are a common reason a job appears frozen even though the process is technically still alive.
yaml
- run: npm ci
  env:
    CI: "true"
    DEBIAN_FRONTEND: noninteractive
  1. 1.Inspect logs and child processes for the step that hangs
  2. 2.If the timeout always hits the same step, instrument that tool or script before simply raising the timeout.

Prevention

  • Put timeout-minutes on every non-trivial job
  • Make package managers, test runners, and deploy scripts explicitly non-interactive
  • Add internal timeouts to network calls and browser tests
  • Watch for job duration regressions before they turn into full hangs