Creating and running another process takes a widely varying amount
of time between operating systems, but on any platform it is very
slow relative to invoking a function.
* git-filter-branch itself is written in shell, which is kind of slow.
This is the one performance issue that could be backward-compatibly
fixed, but compared to the above problems that are intrinsic to the
design of git-filter-branch, the language of the tool itself is a
relatively minor issue.
** Side note: Unfortunately, people tend to fixate on the
written-in-shell aspect and periodically ask if git-filter-branch
could be rewritten in another language to fix the performance
issues. Not only does that ignore the bigger intrinsic problems
with the design, it'd help less than you'd expect: if
git-filter-branch itself were not shell, then the convenience
functions (map(), skip_commit(), etc) and the `--setup` argument
could no longer be executed once at the beginning of the program
but would instead need to be prepended to every user filter (and
thus re-executed with every commit).
The https://github.com/newren/git-filter-repo/[git filter-repo] tool is
an alternative to git-filter-branch which does not suffer from these
performance problems or the safety problems (mentioned below). For those
with existing tooling which relies upon git-filter-branch, 'git
filter-repo' also provides
https://github.com/newren/git-filter-repo/blob/master/contrib/filter-repo-demos/filter-lamely[filter-lamely],
a drop-in git-filter-branch replacement (with a few caveats). While
filter-lamely suffers from all the same safety issues as
git-filter-branch, it at least ameliorates the performance issues a
little.
[[SAFETY]]
SAFETY
------
git-filter-branch is riddled with gotchas resulting in various ways to
easily corrupt repos or end up with a mess worse than what you started
with:
* Someone can have a set of "working and tested filters" which they
document or provide to a coworker, who then runs them on a different
OS where the same commands are not working/tested (some examples in
the git-filter-branch manpage are also affected by this).
BSD vs. GNU userland differences can really bite. If lucky, error
messages are spewed. But just as likely, the commands either don't
do the filtering requested, or silently corrupt by making some
unwanted change. The unwanted change may only affect a few commits,
so it's not necessarily obvious either. (The fact that problems
won't necessarily be obvious means they are likely to go unnoticed
until the rewritten history is in use for quite a while, at which
point it's really hard to justify another flag-day for another
rewrite.)
* Filenames with spaces are often mishandled by shell snippets since
they cause problems for shell pipelines. Not everyone is familiar
with find -print0, xargs -0, git-ls-files -z, etc. Even people who
are familiar with these may assume such flags are not relevant
because someone else renamed any such files in their repo back
before the person doing the filtering joined the project. And
often, even those familiar with handling arguments with spaces may
not do so just because they aren't in the mindset of thinking about
everything that could possibly go wrong.
* Non-ascii filenames can be silently removed despite being in a
desired directory. Keeping only wanted paths is often done using
pipelines like `git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`.
ls-files will only quote filenames if needed, so folks may not
notice that one of the files didn't match the regex (at least not
until it's much too late). Yes, someone who knows about
core.quotePath can avoid this (unless they have other special
characters like \t, \n, or "), and people who use ls-files -z with
something other than grep can avoid this, but that doesn't mean they
will.
* Similarly, when moving files around, one can find that