flags are not relevant
because someone else renamed any such files in their repo back
before the person doing the filtering joined the project. And
often, even those familiar with handling arguments with spaces may
not do so just because they aren't in the mindset of thinking about
everything that could possibly go wrong.
* Non-ascii filenames can be silently removed despite being in a
desired directory. Keeping only wanted paths is often done using
pipelines like `git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`.
ls-files will only quote filenames if needed, so folks may not
notice that one of the files didn't match the regex (at least not
until it's much too late). Yes, someone who knows about
core.quotePath can avoid this (unless they have other special
characters like \t, \n, or "), and people who use ls-files -z with
something other than grep can avoid this, but that doesn't mean they
will.
* Similarly, when moving files around, one can find that filenames
with non-ascii or special characters end up in a different
directory, one that includes a double quote character. (This is
technically the same issue as above with quoting, but perhaps an
interesting different way that it can and has manifested as a
problem.)
* It's far too easy to accidentally mix up old and new history. It's
still possible with any tool, but git-filter-branch almost
invites it. If lucky, the only downside is users getting frustrated
that they don't know how to shrink their repo and remove the old
stuff. If unlucky, they merge old and new history and end up with
multiple "copies" of each commit, some of which have unwanted or
sensitive files and others which don't. This comes about in
multiple different ways:
** the default to only doing a partial history rewrite ('--all' is not
the default and few examples show it)
** the fact that there's no automatic post-run cleanup
** the fact that --tag-name-filter (when used to rename tags) doesn't
remove the old tags but just adds new ones with the new name
** the fact that little educational information is provided to inform
users of the ramifications of a rewrite and how to avoid mixing old
and new history. For example, this man page discusses how users
need to understand that they need to rebase their changes for all
their branches on top of new history (or delete and reclone), but
that's only one of multiple concerns to consider. See the
"DISCUSSION" section of the git filter-repo manual page for more
details.
* Annotated tags can be accidentally converted to lightweight tags,
due to either of two issues:
** Someone can do a history rewrite, realize they messed up, restore
from the backups in refs/original/, and then redo their
git-filter-branch command. (The backup in refs/original/ is not a
real backup; it dereferences tags first.)
** Running git-filter-branch with either --tags or --all in your
<rev-list-options>. In order to retain annotated tags as
annotated, you must use --tag-name-filter (and must not have
restored from refs/original/ in a previously botched rewrite).
* Any commit messages that specify an encoding will become corrupted
by the rewrite; git-filter-branch ignores the encoding, takes the
original bytes, and feeds it to commit-tree without telling it the
proper encoding. (This happens whether or not --msg-filter is
used.)