Git filter-branch rewrites history but can take hours for large repositories. Timeouts, memory exhaustion, and slow performance make it impractical for big repos.
The Error
Running filter-branch:
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch sensitive.txt' \
--prune-empty -- --allYou see:
Rewrite abc123... (1/5000)
Rewrite def456... (2/5000)
...
Rewrite ghi789... (1250/5000)
fatal: write-tree: unable to write object
fatal: filter-branch: command returned error: 128Or process hangs:
``bash
git filter-branch --tree-filter 'sed -i s/old/new/g' *.txt' -- --all
Runs indefinitely with no progress.
Or:
Cannot create a backup ref (refs/original already exists)
fatal: could not create backup refsWhy Filter-Branch Fails
Common causes:
- Large number of commits - Processing thousands of commits takes hours
- Tree filter is slow - Checking out each tree is expensive
- Memory exhaustion - Each rewrite needs significant memory
- Backup ref conflicts - Previous runs leave refs/original
- Index filter complexity - Complex commands slow down rewriting
- Environment filter issues - Tree filter runs shell for every commit
Diagnosis Steps
Count commits to process:
``bash
git rev-list --all --count
Returns number of commits (e.g., 50000).
Check refs/original:
``bash
ls .git/refs/original/
git for-each-ref refs/original/
Check repository size:
``bash
git count-objects -v
Test filter on single commit:
``bash
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch sensitive.txt' \
HEAD~1..HEAD
Solution 1: Use git-filter-repo (Recommended)
git-filter-repo is faster and recommended by Git:
```bash # Install git-filter-repo pip install git-filter-repo
# Remove file from history git filter-repo --invert-paths --path sensitive.txt
# Remove directory git filter-repo --invert-paths --path sensitive-dir/
# Replace text in all files git filter-repo --replace-text expressions.txt ```
expressions.txt format:
``
literal:old_password==>literal:new_password
regex:secret_[a-z]+==>literal:REDACTED
Why git-filter-repo is better: - 10-100x faster than filter-branch - Handles large repos easily - No backup refs clutter - Built-in safety checks - Active maintenance
Solution 2: Remove refs/original Before Re-run
Filter-branch fails if refs/original exists:
fatal: Cannot create a backup refRemove backup refs:
``bash
git for-each-ref --format='%(refname)' refs/original/ | while read ref; do
git update-ref -d "$ref"
done
Or:
``bash
rm -rf .git/refs/original/
Then run filter-branch:
``bash
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch sensitive.txt' \
--prune-empty -- --all
Solution 3: Use Index-Filter Instead of Tree-Filter
Index-filter is much faster:
```bash # Slow - tree filter checks out each commit git filter-branch --tree-filter 'rm sensitive.txt' -- --all
# Fast - index filter modifies index directly git filter-branch --index-filter \ 'git rm --cached --ignore-unmatch sensitive.txt' \ -- --all ```
Index-filter operates on the index without checking out files.
Solution 4: Batch Processing
Process commits in batches:
```bash # Get all commit hashes git rev-list --all > commits.txt
# Process in batches of 1000 total=$(wc -l < commits.txt) batch=1000 for i in $(seq 0 $batch $total); do start=$(sed -n "$((i+1))p" commits.txt) end=$(sed -n "$((i+batch))p" commits.txt) if [ -n "$start" ]; then git filter-branch --force --index-filter \ 'git rm --cached --ignore-unmatch sensitive.txt' \ $start..$end || break fi done ```
Solution 5: Increase Git Memory Limits
Configure for large operations:
git config --global pack.windowMemory 512m
git config --global pack.packSizeLimit 512m
git config --global pack.deltaCacheSize 512m
git config --global core.packedGitLimit 512m
git config --global core.packedGitWindowSize 512mSolution 6: Process Single Branch
Reduce scope to one branch:
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch sensitive.txt' \
--prune-empty mainInstead of -- --all which processes every branch.
Solution 7: Force Fresh Start
When previous runs interfere:
```bash # Remove all backup refs rm -rf .git/refs/original/
# Remove filter-branch temp files rm -rf .git-rewrite/
# Force garbage collection git reflog expire --expire=now --all git gc --prune=now --aggressive
# Run fresh git filter-branch --force --index-filter \ 'git rm --cached --ignore-unmatch sensitive.txt' \ --prune-empty -- --all ```
Solution 8: Use Environment Filter for Text Replacement
For text changes across files:
```bash git filter-branch --force --env-filter ' OLD_EMAIL="old@example.com" NEW_EMAIL="new@example.com"
if [ "$GIT_AUTHOR_EMAIL" = "$OLD_EMAIL" ]; then export GIT_AUTHOR_EMAIL="$NEW_EMAIL" fi if [ "$GIT_COMMITTER_EMAIL" = "$OLD_EMAIL" ]; then export GIT_COMMITTER_EMAIL="$NEW_EMAIL" fi ' -- --all ```
This is faster than tree-filter for author changes.
Solution 9: Use BFG Repo-Cleaner
BFG is faster than filter-branch:
```bash # Install BFG # Download from https://rtyley.github.io/bfg-repo-cleaner/
# Remove file from history java -jar bfg.jar --delete-folders sensitive-dir git reflog expire --expire=now --all && git gc --prune=now --aggressive
# Replace text java -jar bfg.jar --replace-text replacements.txt ```
Verification
Verify file removed from history:
``bash
git log --all --full-history -- sensitive.txt
Should show no results.
Verify commit count matches:
``bash
git rev-list --all --count
Should match original (minus pruned empty commits).
Verify objects cleaned:
``bash
git fsck --full
git count-objects -v
Search for sensitive content:
``bash
git grep "password" $(git rev-list --all)
Should find nothing.
Verify all branches intact:
``bash
git branch -a
Post-Filter Cleanup
After filter-branch succeeds:
```bash # Remove backup refs rm -rf .git/refs/original/
# Expire reflog git reflog expire --expire=now --all
# Aggressive garbage collection git gc --prune=now --aggressive
# Force push to remote (coordinate with team) git push --force --all origin git push --force --tags origin ```
Important: Coordinate with team before force pushing. Everyone must reclone.
Filter-Branch vs Alternatives Comparison
| Method | Speed | Ease | Large Repos |
|---|---|---|---|
| filter-branch | Very slow | Complex | Often fails |
| git-filter-repo | Very fast | Simple | Handles well |
| BFG | Fast | Simple | Handles well |
| Manual rebase | Slow | Complex | Small repos |
Best Practices
Prefer git-filter-repo:
``bash
pip install git-filter-repo
git filter-repo --invert-paths --path file.txt
Always use for new operations.
Test on small subset first:
``bash
git filter-branch --index-filter '...' HEAD~5..HEAD
Verify filter works before full run.
Backup before rewriting:
``bash
git clone --mirror /path/to/repo /backup/repo.git
Coordinate with team: Filter operations require everyone to reclone.
Document what was filtered: Keep record of removed files/changes for team awareness.
Common Scenarios
Remove accidentally committed secret:
``bash
git filter-repo --invert-paths --path .env
git push --force origin main
Change author email:
``bash
git filter-repo --email-callback '
return email.replace(b"old@example.com", b"new@example.com")
'
Remove large binary from history:
``bash
git filter-repo --invert-paths --path assets/large-video.mp4
git gc --prune=now --aggressive