helping users find what unwanted crud
they should delete, which means they are much more likely to have
incomplete or partial cleanups that sometimes result in confusion
and people wasting time trying to understand. (For example, folks
tend to just look for big files to delete instead of big directories
or extensions, and once they do so, then sometime later folks using
the new repository who are going through history will notice a build
artifact directory that has some files but not others, or a cache of
dependencies (node_modules or similar) which couldn't have ever been
functional since it's missing some files.)
* If --prune-empty isn't specified, then the filtering process can
create hoards of confusing empty commits
* If --prune-empty is specified, then intentionally placed empty
commits from before the filtering operation are also pruned instead
of just pruning commits that became empty due to filtering rules.
* If --prune-empty is specified, sometimes empty commits are missed
and left around anyway (a somewhat rare bug, but it happens...)
* A minor issue, but users who have a goal to update all names and
emails in a repository may be led to --env-filter which will only
update authors and committers, missing taggers.
* If the user provides a --tag-name-filter that maps multiple tags to
the same name, no warning or error is provided; git-filter-branch
simply overwrites each tag in some undocumented pre-defined order
resulting in only one tag at the end. (A git-filter-branch
regression test requires this surprising behavior.)
Also, the poor performance of git-filter-branch often leads to safety
issues:
* Coming up with the correct shell snippet to do the filtering you
want is sometimes difficult unless you're just doing a trivial
modification such as deleting a couple files. Unfortunately, people
often learn if the snippet is right or wrong by trying it out, but
the rightness or wrongness can vary depending on special
circumstances (spaces in filenames, non-ascii filenames, funny
author names or emails, invalid timezones, presence of grafts or
replace objects, etc.), meaning they may have to wait a long time,
hit an error, then restart. The performance of git-filter-branch is
so bad that this cycle is painful, reducing the time available to
carefully re-check (to say nothing about what it does to the
patience of the person doing the rewrite even if they do technically
have more time available). This problem is extra compounded because
errors from broken filters may not be shown for a long time and/or
get lost in a sea of output. Even worse, broken filters often just
result in silent incorrect rewrites.
* To top it all off, even when users finally find working commands,
they naturally want to share them. But they may be unaware that
their repo didn't have some special cases that someone else's does.
So, when someone else with a different repository runs the same
commands, they get hit by the problems above. Or, the user just
runs commands that really were vetted for special cases, but they
run it on a different OS where it doesn't work, as noted above.
GIT
---
Part of the linkgit:git[1] suite