Git Filter Branch Safety Issues

flags are not relevant because someone else renamed any such files in their repo back before the person doing the filtering joined the project. And often, even those familiar with handling arguments with spaces may not do so just because they aren't in the mindset of thinking about everything that could possibly go wrong. * Non-ascii filenames can be silently removed despite being in a desired directory. Keeping only wanted paths is often done using pipelines like `git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`. ls-files will only quote filenames if needed, so folks may not notice that one of the files didn't match the regex (at least not until it's much too late). Yes, someone who knows about core.quotePath can avoid this (unless they have other special characters like \t, \n, or "), and people who use ls-files -z with something other than grep can avoid this, but that doesn't mean they will. * Similarly, when moving files around, one can find that filenames with non-ascii or special characters end up in a different directory, one that includes a double quote character. (This is technically the same issue as above with quoting, but perhaps an interesting different way that it can and has manifested as a problem.) * It's far too easy to accidentally mix up old and new history. It's still possible with any tool, but git-filter-branch almost invites it. If lucky, the only downside is users getting frustrated that they don't know how to shrink their repo and remove the old stuff. If unlucky, they merge old and new history and end up with multiple "copies" of each commit, some of which have unwanted or sensitive files and others which don't. This comes about in multiple different ways: ** the default to only doing a partial history rewrite ('--all' is not the default and few examples show it) ** the fact that there's no automatic post-run cleanup ** the fact that --tag-name-filter (when used to rename tags) doesn't remove the old tags but just adds new ones with the new name ** the fact that little educational information is provided to inform users of the ramifications of a rewrite and how to avoid mixing old and new history. For example, this man page discusses how users need to understand that they need to rebase their changes for all their branches on top of new history (or delete and reclone), but that's only one of multiple concerns to consider. See the "DISCUSSION" section of the git filter-repo manual page for more details. * Annotated tags can be accidentally converted to lightweight tags, due to either of two issues: ** Someone can do a history rewrite, realize they messed up, restore from the backups in refs/original/, and then redo their git-filter-branch command. (The backup in refs/original/ is not a real backup; it dereferences tags first.) ** Running git-filter-branch with either --tags or --all in your <rev-list-options>. In order to retain annotated tags as annotated, you must use --tag-name-filter (and must not have restored from refs/original/ in a previously botched rewrite). * Any commit messages that specify an encoding will become corrupted by the rewrite; git-filter-branch ignores the encoding, takes the original bytes, and feeds it to commit-tree without telling it the proper encoding. (This happens whether or not --msg-filter is used.)

The text outlines several safety concerns with using git filter-branch, including the potential for silently removing non-ASCII filenames, accidentally mixing up old and new history, converting annotated tags to lightweight tags, and corrupting commit messages with specified encodings, highlighting the need for caution and careful consideration when using this tool.