order). This is a very destructive
approach, so *make a backup* or go back to cloning it. You have been
warned.
* Remove the original refs backed up by git-filter-branch: say `git
for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git
update-ref -d`.
* Expire all reflogs with `git reflog expire --expire=now --all`.
* Garbage collect all unreferenced objects with `git gc --prune=now`
(or if your git-gc is not new enough to support arguments to
`--prune`, use `git repack -ad; git prune` instead).
[[PERFORMANCE]]
PERFORMANCE
-----------
The performance of git-filter-branch is glacially slow; its design makes it
impossible for a backward-compatible implementation to ever be fast:
* In editing files, git-filter-branch by design checks out each and
every commit as it existed in the original repo. If your repo has
`10^5` files and `10^5` commits, but each commit only modifies five
files, then git-filter-branch will make you do `10^10` modifications,
despite only having (at most) `5*10^5` unique blobs.
* If you try and cheat and try to make git-filter-branch only work on
files modified in a commit, then two things happen
** you run into problems with deletions whenever the user is simply
trying to rename files (because attempting to delete files that
don't exist looks like a no-op; it takes some chicanery to remap
deletes across file renames when the renames happen via arbitrary
user-provided shell)
** even if you succeed at the map-deletes-for-renames chicanery, you
still technically violate backward compatibility because users
are allowed to filter files in ways that depend upon topology of
commits instead of filtering solely based on file contents or
names (though this has not been observed in the wild).
* Even if you don't need to edit files but only want to e.g. rename or
remove some and thus can avoid checking out each file (i.e. you can
use --index-filter), you still are passing shell snippets for your
filters. This means that for every commit, you have to have a
prepared git repo where those filters can be run. That's a
significant setup.
* Further, several additional files are created or updated per commit
by git-filter-branch. Some of these are for supporting the
convenience functions provided by git-filter-branch (such as map()),
while others are for keeping track of internal state (but could have
also been accessed by user filters; one of git-filter-branch's
regression tests does so). This essentially amounts to using the
filesystem as an IPC mechanism between git-filter-branch and the
user-provided filters. Disks tend to be a slow IPC mechanism, and
writing these files also effectively represents a forced
synchronization point between separate processes that we hit with
every commit.
* The user-provided shell commands will likely involve a pipeline of
commands, resulting in the creation of many processes per commit.
Creating and running another process takes a widely varying amount
of time between operating systems, but on any platform it is very
slow relative to invoking a function.
* git-filter-branch itself is written in shell, which is kind of slow.
This is the one performance issue that could be backward-compatibly
fixed, but compared to the above problems that are intrinsic to the
design of git-filter-branch, the language of the tool itself is a
relatively minor issue.
** Side note: Unfortunately, people tend to fixate on the
written-in-shell aspect and periodically ask if git-filter-branch
could be rewritten in another language to fix the performance
issues. Not only does that ignore the bigger intrinsic problems
with the design, it'd help less than you'd expect: if
git-filter-branch itself were not shell, then the convenience
functions (map(), skip_commit(), etc) and the `--setup` argument
could no longer be executed once at the beginning