Git filter-branch rewrites history but can take hours for large repositories. Timeouts, memory exhaustion, and slow performance make it impractical for big repos.

The Error

Running filter-branch:

bash
git filter-branch --force --index-filter \
  'git rm --cached --ignore-unmatch sensitive.txt' \
  --prune-empty -- --all

You see:

bash
Rewrite abc123... (1/5000)
Rewrite def456... (2/5000)
...
Rewrite ghi789... (1250/5000)
fatal: write-tree: unable to write object
fatal: filter-branch: command returned error: 128

Or process hangs: ``bash git filter-branch --tree-filter 'sed -i s/old/new/g' *.txt' -- --all

Runs indefinitely with no progress.

Or:

bash
Cannot create a backup ref (refs/original already exists)
fatal: could not create backup refs

Why Filter-Branch Fails

Common causes:

  • Large number of commits - Processing thousands of commits takes hours
  • Tree filter is slow - Checking out each tree is expensive
  • Memory exhaustion - Each rewrite needs significant memory
  • Backup ref conflicts - Previous runs leave refs/original
  • Index filter complexity - Complex commands slow down rewriting
  • Environment filter issues - Tree filter runs shell for every commit

Diagnosis Steps

Count commits to process: ``bash git rev-list --all --count

Returns number of commits (e.g., 50000).

Check refs/original: ``bash ls .git/refs/original/ git for-each-ref refs/original/

Check repository size: ``bash git count-objects -v

Test filter on single commit: ``bash git filter-branch --force --index-filter \ 'git rm --cached --ignore-unmatch sensitive.txt' \ HEAD~1..HEAD

git-filter-repo is faster and recommended by Git:

```bash # Install git-filter-repo pip install git-filter-repo

# Remove file from history git filter-repo --invert-paths --path sensitive.txt

# Remove directory git filter-repo --invert-paths --path sensitive-dir/

# Replace text in all files git filter-repo --replace-text expressions.txt ```

expressions.txt format: `` literal:old_password==>literal:new_password regex:secret_[a-z]+==>literal:REDACTED

Why git-filter-repo is better: - 10-100x faster than filter-branch - Handles large repos easily - No backup refs clutter - Built-in safety checks - Active maintenance

Solution 2: Remove refs/original Before Re-run

Filter-branch fails if refs/original exists:

bash
fatal: Cannot create a backup ref

Remove backup refs: ``bash git for-each-ref --format='%(refname)' refs/original/ | while read ref; do git update-ref -d "$ref" done

Or: ``bash rm -rf .git/refs/original/

Then run filter-branch: ``bash git filter-branch --force --index-filter \ 'git rm --cached --ignore-unmatch sensitive.txt' \ --prune-empty -- --all

Solution 3: Use Index-Filter Instead of Tree-Filter

Index-filter is much faster:

```bash # Slow - tree filter checks out each commit git filter-branch --tree-filter 'rm sensitive.txt' -- --all

# Fast - index filter modifies index directly git filter-branch --index-filter \ 'git rm --cached --ignore-unmatch sensitive.txt' \ -- --all ```

Index-filter operates on the index without checking out files.

Solution 4: Batch Processing

Process commits in batches:

```bash # Get all commit hashes git rev-list --all > commits.txt

# Process in batches of 1000 total=$(wc -l < commits.txt) batch=1000 for i in $(seq 0 $batch $total); do start=$(sed -n "$((i+1))p" commits.txt) end=$(sed -n "$((i+batch))p" commits.txt) if [ -n "$start" ]; then git filter-branch --force --index-filter \ 'git rm --cached --ignore-unmatch sensitive.txt' \ $start..$end || break fi done ```

Solution 5: Increase Git Memory Limits

Configure for large operations:

bash
git config --global pack.windowMemory 512m
git config --global pack.packSizeLimit 512m
git config --global pack.deltaCacheSize 512m
git config --global core.packedGitLimit 512m
git config --global core.packedGitWindowSize 512m

Solution 6: Process Single Branch

Reduce scope to one branch:

bash
git filter-branch --force --index-filter \
  'git rm --cached --ignore-unmatch sensitive.txt' \
  --prune-empty main

Instead of -- --all which processes every branch.

Solution 7: Force Fresh Start

When previous runs interfere:

```bash # Remove all backup refs rm -rf .git/refs/original/

# Remove filter-branch temp files rm -rf .git-rewrite/

# Force garbage collection git reflog expire --expire=now --all git gc --prune=now --aggressive

# Run fresh git filter-branch --force --index-filter \ 'git rm --cached --ignore-unmatch sensitive.txt' \ --prune-empty -- --all ```

Solution 8: Use Environment Filter for Text Replacement

For text changes across files:

```bash git filter-branch --force --env-filter ' OLD_EMAIL="old@example.com" NEW_EMAIL="new@example.com"

if [ "$GIT_AUTHOR_EMAIL" = "$OLD_EMAIL" ]; then export GIT_AUTHOR_EMAIL="$NEW_EMAIL" fi if [ "$GIT_COMMITTER_EMAIL" = "$OLD_EMAIL" ]; then export GIT_COMMITTER_EMAIL="$NEW_EMAIL" fi ' -- --all ```

This is faster than tree-filter for author changes.

Solution 9: Use BFG Repo-Cleaner

BFG is faster than filter-branch:

```bash # Install BFG # Download from https://rtyley.github.io/bfg-repo-cleaner/

# Remove file from history java -jar bfg.jar --delete-folders sensitive-dir git reflog expire --expire=now --all && git gc --prune=now --aggressive

# Replace text java -jar bfg.jar --replace-text replacements.txt ```

Verification

Verify file removed from history: ``bash git log --all --full-history -- sensitive.txt

Should show no results.

Verify commit count matches: ``bash git rev-list --all --count

Should match original (minus pruned empty commits).

Verify objects cleaned: ``bash git fsck --full git count-objects -v

Search for sensitive content: ``bash git grep "password" $(git rev-list --all)

Should find nothing.

Verify all branches intact: ``bash git branch -a

Post-Filter Cleanup

After filter-branch succeeds:

```bash # Remove backup refs rm -rf .git/refs/original/

# Expire reflog git reflog expire --expire=now --all

# Aggressive garbage collection git gc --prune=now --aggressive

# Force push to remote (coordinate with team) git push --force --all origin git push --force --tags origin ```

Important: Coordinate with team before force pushing. Everyone must reclone.

Filter-Branch vs Alternatives Comparison

MethodSpeedEaseLarge Repos
filter-branchVery slowComplexOften fails
git-filter-repoVery fastSimpleHandles well
BFGFastSimpleHandles well
Manual rebaseSlowComplexSmall repos

Best Practices

Prefer git-filter-repo: ``bash pip install git-filter-repo git filter-repo --invert-paths --path file.txt

Always use for new operations.

Test on small subset first: ``bash git filter-branch --index-filter '...' HEAD~5..HEAD

Verify filter works before full run.

Backup before rewriting: ``bash git clone --mirror /path/to/repo /backup/repo.git

Coordinate with team: Filter operations require everyone to reclone.

Document what was filtered: Keep record of removed files/changes for team awareness.

Common Scenarios

Remove accidentally committed secret: ``bash git filter-repo --invert-paths --path .env git push --force origin main

Change author email: ``bash git filter-repo --email-callback ' return email.replace(b"old@example.com", b"new@example.com") '

Remove large binary from history: ``bash git filter-repo --invert-paths --path assets/large-video.mp4 git gc --prune=now --aggressive