Introduction

GitHub Actions matrix strategy runs a job across multiple combinations of variables (OS, Node version, etc.). By default, fail-fast is set to true, meaning if any matrix combination fails, GitHub cancels all in-progress and queued matrix jobs. While this saves resources, it can hide issues in other combinations and prevent useful test results from being collected.

Symptoms

  • Matrix job shows Cancelled status for most combinations after one fails
  • Only one failure is visible while other combinations never run
  • Intermittent failures in one combination block results from all other combinations
  • Test results are incomplete because most matrix jobs were cancelled
  • Status check shows failure even though 9 of 12 matrix combinations would have passed

Common Causes

  • fail-fast: true (default) cancelling all other matrix jobs on first failure
  • Flaky test in one specific OS/version combination
  • Environment-specific issue (e.g., macOS-specific build failure)
  • Resource contention on the runner affecting one combination disproportionately
  • Dependency incompatibility with a specific version in the matrix

Step-by-Step Fix

  1. 1.Disable fail-fast to let all matrix combinations complete: See all results.
  2. 2.```yaml
  3. 3.jobs:
  4. 4.test:
  5. 5.runs-on: ${{ matrix.os }}
  6. 6.strategy:
  7. 7.fail-fast: false # Let all combinations run even if one fails
  8. 8.matrix:
  9. 9.os: [ubuntu-latest, windows-latest, macos-latest]
  10. 10.node: [16, 18, 20]
  11. 11.`
  12. 12.Identify the specific failing combination: Check which matrix variables caused the failure.
  13. 13.`
  14. 14.# In the workflow run, click on each matrix combination
  15. 15.# Check the matrix variables in the job name:
  16. 16.# "test (ubuntu-latest, 18)" failed
  17. 17.`
  18. 18.Fix the issue in the failing combination: Address the specific failure.
  19. 19.```yaml
  20. 20.# If the issue is OS-specific, add conditional steps
  21. 21.- name: Install dependencies
  22. 22.run: |
  23. 23.if [ "${{ runner.os }}" == "macOS" ]; then
  24. 24.brew install specific-package
  25. 25.else
  26. 26.npm install
  27. 27.fi
  28. 28.`
  29. 29.Use continue-on-error for known flaky combinations: Allow the workflow to proceed.
  30. 30.```yaml
  31. 31.jobs:
  32. 32.test:
  33. 33.strategy:
  34. 34.fail-fast: false
  35. 35.matrix:
  36. 36.os: [ubuntu-latest, macos-latest]
  37. 37.continue-on-error: ${{ matrix.os == 'macos-latest' }}
  38. 38.`
  39. 39.Set up required status checks for only critical combinations: Don't block PRs on non-critical matrix jobs.
  40. 40.`
  41. 41.# In branch protection rules, only require:
  42. 42.# - test (ubuntu-latest, 18)
  43. 43.# - test (ubuntu-latest, 20)
  44. 44.# Not every single matrix combination
  45. 45.`

Prevention

  • Set fail-fast: false for test matrices to collect complete results
  • Use continue-on-error for experimental or known-flaky matrix combinations
  • Separate critical matrix combinations (required for PR merge) from exploratory ones
  • Monitor matrix failure patterns to identify environment-specific issues
  • Implement retry logic for flaky tests in specific matrix combinations
  • Document which matrix combinations are required vs. informational for PR checks