Home Explore Blog CI



git

11th chunk of `Documentation/git-filter-branch.adoc`
de449021d6f48af816fc2c0bdcf59247708f2b570a8d6d990000000100000cb3
 helping users find what unwanted crud
  they should delete, which means they are much more likely to have
  incomplete or partial cleanups that sometimes result in confusion
  and people wasting time trying to understand.  (For example, folks
  tend to just look for big files to delete instead of big directories
  or extensions, and once they do so, then sometime later folks using
  the new repository who are going through history will notice a build
  artifact directory that has some files but not others, or a cache of
  dependencies (node_modules or similar) which couldn't have ever been
  functional since it's missing some files.)

* If --prune-empty isn't specified, then the filtering process can
  create hoards of confusing empty commits

* If --prune-empty is specified, then intentionally placed empty
  commits from before the filtering operation are also pruned instead
  of just pruning commits that became empty due to filtering rules.

* If --prune-empty is specified, sometimes empty commits are missed
  and left around anyway (a somewhat rare bug, but it happens...)

* A minor issue, but users who have a goal to update all names and
  emails in a repository may be led to --env-filter which will only
  update authors and committers, missing taggers.

* If the user provides a --tag-name-filter that maps multiple tags to
  the same name, no warning or error is provided; git-filter-branch
  simply overwrites each tag in some undocumented pre-defined order
  resulting in only one tag at the end.  (A git-filter-branch
  regression test requires this surprising behavior.)

Also, the poor performance of git-filter-branch often leads to safety
issues:

* Coming up with the correct shell snippet to do the filtering you
  want is sometimes difficult unless you're just doing a trivial
  modification such as deleting a couple files.  Unfortunately, people
  often learn if the snippet is right or wrong by trying it out, but
  the rightness or wrongness can vary depending on special
  circumstances (spaces in filenames, non-ascii filenames, funny
  author names or emails, invalid timezones, presence of grafts or
  replace objects, etc.), meaning they may have to wait a long time,
  hit an error, then restart.  The performance of git-filter-branch is
  so bad that this cycle is painful, reducing the time available to
  carefully re-check (to say nothing about what it does to the
  patience of the person doing the rewrite even if they do technically
  have more time available).  This problem is extra compounded because
  errors from broken filters may not be shown for a long time and/or
  get lost in a sea of output.  Even worse, broken filters often just
  result in silent incorrect rewrites.

* To top it all off, even when users finally find working commands,
  they naturally want to share them.  But they may be unaware that
  their repo didn't have some special cases that someone else's does.
  So, when someone else with a different repository runs the same
  commands, they get hit by the problems above.  Or, the user just
  runs commands that really were vetted for special cases, but they
  run it on a different OS where it doesn't work, as noted above.

GIT
---
Part of the linkgit:git[1] suite

Title: Git Filter Branch Limitations and Safety Concerns
Summary
The text outlines various issues with git filter-branch, including incomplete cleanups, creation of empty commits, unexpected behavior with tag name filters, and poor performance leading to safety issues, such as difficult debugging, silent incorrect rewrites, and sharing of commands that may not work in different repositories or environments.