Fix GitHub Actions Workflow Failures - Complete Debugging Guide

# GitHub Actions Workflow Failed: Complete Troubleshooting Guide

You pushed code to GitHub, the Actions workflow started, and then it failed. Now you're looking at a sea of red X's and need to figure out what went wrong.

GitHub Actions failures can happen at any stage—from checkout to deployment. Let me walk through the most common failure patterns and how to fix them.

Reading the Failure Logs

Before anything else, understand how to read GitHub Actions logs:

1.Go to your repository's Actions tab
2.Click on the failed workflow run
3.Click on the failed job (marked with a red X)
4.Click on the failed step to expand its logs

Pro tip: Use Ctrl+F to search for "Error", "FAILED", "exception", or "fatal". The actual error is often buried in verbose output.

Fix 1: Dependency Installation Failures

The most common failure happens during dependency installation:

bash

npm ERR! 404 Not Found - GET https://registry.npmjs.org/@scope/package/-/package-1.0.0.tgz
npm ERR! 404  '@scope/package@1.0.0' is not in the npm registry.

Or for Python:

bash

ERROR: Could not find a version that satisfies the requirement package==1.0.0

Diagnosis:

Check if the dependency exists and your package.json or requirements.txt is correct:

yaml

# Add a debug step before your install step
- name: Debug dependencies
  run: |
    cat package.json
    npm ls --depth=0 || true

Solution for private packages:

If you're using private npm packages:

```yaml - name: Setup Node.js uses: actions/setup-node@v4 with: node-version: '20' registry-url: 'https://npm.pkg.github.com' scope: '@your-org'

name: Install dependencies
run: npm ci
env:
NODE_AUTH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
`

For Python private packages:

yaml

- name: Install dependencies
  run: pip install -r requirements.txt
  env:
    PIP_EXTRA_INDEX_URL: https://${{ secrets.PYPI_TOKEN }}@pypi.example.com/simple/

Solution for lockfile issues:

bash

npm ERR! `npm ci` requires a clean package-lock.json

Regenerate your lockfile:

bash

# Local
rm package-lock.json
npm install
git add package-lock.json
git commit -m "Regenerate package-lock.json"
git push

Fix 2: Permission Denied Errors

You might see errors like:

bash

Error: EACCES: permission denied, open '/home/runner/.npm/_logs/...'

Or:

bash

fatal: could not create work tree dir 'repo-name': Permission denied

Solution:

GitHub Actions runs as a non-root user. If you need elevated permissions:

```yaml - name: Fix permissions run: sudo chown -R $(whoami) /path/to/directory

# Or run the step as root - name: Run as root run: | sudo apt-get update sudo apt-get install -y package-name ```

For Git operations, ensure your token has correct permissions:

yaml

- name: Checkout
  uses: actions/checkout@v4
  with:
    token: ${{ secrets.GITHUB_TOKEN }}  # Has limited permissions

If you need to push changes:

yaml

- name: Push changes
  run: |
    git config user.name "github-actions[bot]"
    git config user.email "github-actions[bot]@users.noreply.github.com"
    git push
  env:
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}  # Or use PAT for more permissions

Fix 3: Step Timeout Failures

Some steps take longer than expected:

bash

Error: The action 'Build' has timed out after 360 minutes

Diagnosis:

Your step or job might be hanging. Add timeout and debug steps:

yaml

jobs:
  build:
    timeout-minutes: 30  # Job-level timeout
    steps:
      - name: Long running step
        timeout-minutes: 10  # Step-level timeout
        run: ./build.sh

Solution:

For genuinely long builds, increase the timeout:

yaml

jobs:
  build:
    timeout-minutes: 90  # Increase from default 360

For stuck processes, add process monitoring:

yaml

- name: Debug running processes
  if: always()  # Runs even if previous steps fail
  run: |
    ps aux
    docker ps -a
    df -h

Fix 4: Environment Variable Issues

Missing or incorrect environment variables:

bash

Error: API_KEY is not defined

Diagnosis:

Add a debug step to check your environment:

yaml

- name: Debug environment
  run: |
    echo "Node version: $(node --version)"
    echo "NPM version: $(npm --version)"
    echo "Working directory: $(pwd)"
    echo "Available secrets: ${?GITHUB_TOKEN}"  # Don't echo actual secret values!

Solution:

Set environment variables correctly:

yaml

jobs:
  build:
    env:
      NODE_ENV: test
      API_URL: https://api.example.com
    steps:
      - name: Use environment
        run: echo "API URL is $API_URL"
        env:
          PER_STEP_VAR: value

For secrets:

yaml

- name: Deploy
  run: ./deploy.sh
  env:
    AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
    AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

Important: Never echo secret values. They're masked in logs, but still don't print them.

Fix 5: Docker Build Failures

Docker-related errors are common:

bash

Error: denied: permission_denied: write access to repository

Or:

bash

Error: no space left on device

Solution for registry permission:

```yaml - name: Login to Docker Hub uses: docker/login-action@v3 with: username: ${{ secrets.DOCKERHUB_USERNAME }} password: ${{ secrets.DOCKERHUB_TOKEN }}

name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: user/image:latest
`

For GitHub Container Registry:

yaml

- name: Login to GHCR
  uses: docker/login-action@v3
  with:
    registry: ghcr.io
    username: ${{ github.actor }}
    password: ${{ secrets.GITHUB_TOKEN }}

Solution for disk space:

Add disk cleanup:

yaml

- name: Free disk space
  run: |
    sudo rm -rf /usr/share/dotnet
    sudo rm -rf /usr/local/lib/android
    sudo rm -rf /opt/ghc
    df -h

Or use a specialized action:

yaml

- name: Free disk space
  uses: jlumbroso/free-disk-space@main
  with:
    tool-cache: false

Fix 6: Matrix Build Failures

Matrix builds fail for one configuration but pass others:

yaml

strategy:
  matrix:
    node: [16, 18, 20]

Diagnosis:

Check which matrix combination failed:

yaml

- name: Debug matrix
  run: |
    echo "Testing with Node ${{ matrix.node }}"
    echo "Running on ${{ runner.os }}"

Solution:

Allow specific failures to pass:

yaml

strategy:
  fail-fast: false  # Continue other matrix jobs even if one fails
  matrix:
    node: [16, 18, 20]

Or exclude problematic combinations:

yaml

strategy:
  matrix:
    os: [ubuntu-latest, windows-latest]
    node: [16, 18, 20]
    exclude:
      - os: windows-latest
        node: 16  # Skip this combination

Fix 7: Cache Issues

Sometimes cache causes problems:

bash

Error: Unable to restore cache

Solution:

Add cache invalidation:

yaml

- name: Cache node modules
  uses: actions/cache@v4
  with:
    path: ~/.npm
    key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      ${{ runner.os }}-node-

To force cache refresh, update the key:

yaml

key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}-v2

Debugging Workflow Files

When the workflow file itself has issues:

yaml

# Add this at the workflow level for maximum debug output
env:
  ACTIONS_STEP_DEBUG: true
  ACTIONS_RUNNER_DEBUG: true

Or use the runner debug shell:

yaml

- name: Debug with tmate
  if: failure()
  uses: mxschmitt/action-tmate@v3
  timeout-minutes: 15

This gives you SSH access to the runner for interactive debugging.

Quick Reference: Common Failure Patterns

Error Pattern	Cause	Solution
`npm ERR! 404`	Missing package	Check package name, registry
`EACCES`	Permission denied	Fix file/directory permissions
`ETIMEDOUT`	Network timeout	Check external services
`command not found`	Tool not installed	Add setup step
`ENOMEM`	Out of memory	Optimize build, increase timeout
`SIGTERM`	Process killed	Check for timeout, resource limits

Preventing Failures

Add a linting step to catch issues early:

```yaml name: CI

on: push

jobs: lint-workflow: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Validate workflow run: | # Install actionlint go install github.com/rhysd/actionlint/cmd/actionlint@latest actionlint .github/workflows/*.yml ```

This catches syntax errors and common misconfigurations before they cause failures in your actual workflows.

GitHub Actions Workflow Failed: Complete Troubleshooting Guide

Reading the Failure Logs

Fix 1: Dependency Installation Failures

Fix 2: Permission Denied Errors

Fix 3: Step Timeout Failures

Fix 4: Environment Variable Issues

Fix 5: Docker Build Failures

Fix 6: Matrix Build Failures

Fix 7: Cache Issues

Debugging Workflow Files

Quick Reference: Common Failure Patterns

Preventing Failures

Share this guide

More CI/CD Troubleshooting Guides

Tekton Workspace Not Bound

Tekton TaskRun Timeout

Tekton PipelineRun Failed

Flux Source Not Ready

Flux Helm Release Failed

Flux Kustomization Not Applying