Home Explore Blog CI



git

8th chunk of `Documentation/git-filter-branch.adoc`
f3efa308d4b2c57df17c25c3b81384538ca242a5c61b1d610000000100000fa0
 Creating and running another process takes a widely varying amount
  of time between operating systems, but on any platform it is very
  slow relative to invoking a function.

* git-filter-branch itself is written in shell, which is kind of slow.
  This is the one performance issue that could be backward-compatibly
  fixed, but compared to the above problems that are intrinsic to the
  design of git-filter-branch, the language of the tool itself is a
  relatively minor issue.

  ** Side note: Unfortunately, people tend to fixate on the
     written-in-shell aspect and periodically ask if git-filter-branch
     could be rewritten in another language to fix the performance
     issues.  Not only does that ignore the bigger intrinsic problems
     with the design, it'd help less than you'd expect: if
     git-filter-branch itself were not shell, then the convenience
     functions (map(), skip_commit(), etc) and the `--setup` argument
     could no longer be executed once at the beginning of the program
     but would instead need to be prepended to every user filter (and
     thus re-executed with every commit).

The https://github.com/newren/git-filter-repo/[git filter-repo] tool is
an alternative to git-filter-branch which does not suffer from these
performance problems or the safety problems (mentioned below). For those
with existing tooling which relies upon git-filter-branch, 'git
filter-repo' also provides
https://github.com/newren/git-filter-repo/blob/master/contrib/filter-repo-demos/filter-lamely[filter-lamely],
a drop-in git-filter-branch replacement (with a few caveats).  While
filter-lamely suffers from all the same safety issues as
git-filter-branch, it at least ameliorates the performance issues a
little.

[[SAFETY]]
SAFETY
------

git-filter-branch is riddled with gotchas resulting in various ways to
easily corrupt repos or end up with a mess worse than what you started
with:

* Someone can have a set of "working and tested filters" which they
  document or provide to a coworker, who then runs them on a different
  OS where the same commands are not working/tested (some examples in
  the git-filter-branch manpage are also affected by this).
  BSD vs. GNU userland differences can really bite.  If lucky, error
  messages are spewed.  But just as likely, the commands either don't
  do the filtering requested, or silently corrupt by making some
  unwanted change.  The unwanted change may only affect a few commits,
  so it's not necessarily obvious either.  (The fact that problems
  won't necessarily be obvious means they are likely to go unnoticed
  until the rewritten history is in use for quite a while, at which
  point it's really hard to justify another flag-day for another
  rewrite.)

* Filenames with spaces are often mishandled by shell snippets since
  they cause problems for shell pipelines.  Not everyone is familiar
  with find -print0, xargs -0, git-ls-files -z, etc.  Even people who
  are familiar with these may assume such flags are not relevant
  because someone else renamed any such files in their repo back
  before the person doing the filtering joined the project.  And
  often, even those familiar with handling arguments with spaces may
  not do so just because they aren't in the mindset of thinking about
  everything that could possibly go wrong.

* Non-ascii filenames can be silently removed despite being in a
  desired directory.  Keeping only wanted paths is often done using
  pipelines like `git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`.
  ls-files will only quote filenames if needed, so folks may not
  notice that one of the files didn't match the regex (at least not
  until it's much too late).  Yes, someone who knows about
  core.quotePath can avoid this (unless they have other special
  characters like \t, \n, or "), and people who use ls-files -z with
  something other than grep can avoid this, but that doesn't mean they
  will.

* Similarly, when moving files around, one can find that

Title: Git Filter Branch Safety and Performance Concerns
Summary
The text highlights the safety and performance concerns of using git filter-branch, including its potential to corrupt repositories, mishandle filenames with spaces or non-ASCII characters, and introduce platform-dependent issues, and recommends using the alternative git filter-repo tool to avoid these problems.