Git Filter Branch Limitations and Safety Concerns

helping users find what unwanted crud they should delete, which means they are much more likely to have incomplete or partial cleanups that sometimes result in confusion and people wasting time trying to understand. (For example, folks tend to just look for big files to delete instead of big directories or extensions, and once they do so, then sometime later folks using the new repository who are going through history will notice a build artifact directory that has some files but not others, or a cache of dependencies (node_modules or similar) which couldn't have ever been functional since it's missing some files.) * If --prune-empty isn't specified, then the filtering process can create hoards of confusing empty commits * If --prune-empty is specified, then intentionally placed empty commits from before the filtering operation are also pruned instead of just pruning commits that became empty due to filtering rules. * If --prune-empty is specified, sometimes empty commits are missed and left around anyway (a somewhat rare bug, but it happens...) * A minor issue, but users who have a goal to update all names and emails in a repository may be led to --env-filter which will only update authors and committers, missing taggers. * If the user provides a --tag-name-filter that maps multiple tags to the same name, no warning or error is provided; git-filter-branch simply overwrites each tag in some undocumented pre-defined order resulting in only one tag at the end. (A git-filter-branch regression test requires this surprising behavior.) Also, the poor performance of git-filter-branch often leads to safety issues: * Coming up with the correct shell snippet to do the filtering you want is sometimes difficult unless you're just doing a trivial modification such as deleting a couple files. Unfortunately, people often learn if the snippet is right or wrong by trying it out, but the rightness or wrongness can vary depending on special circumstances (spaces in filenames, non-ascii filenames, funny author names or emails, invalid timezones, presence of grafts or replace objects, etc.), meaning they may have to wait a long time, hit an error, then restart. The performance of git-filter-branch is so bad that this cycle is painful, reducing the time available to carefully re-check (to say nothing about what it does to the patience of the person doing the rewrite even if they do technically have more time available). This problem is extra compounded because errors from broken filters may not be shown for a long time and/or get lost in a sea of output. Even worse, broken filters often just result in silent incorrect rewrites. * To top it all off, even when users finally find working commands, they naturally want to share them. But they may be unaware that their repo didn't have some special cases that someone else's does. So, when someone else with a different repository runs the same commands, they get hit by the problems above. Or, the user just runs commands that really were vetted for special cases, but they run it on a different OS where it doesn't work, as noted above. GIT --- Part of the linkgit:git[1] suite

The text outlines various issues with git filter-branch, including incomplete cleanups, creation of empty commits, unexpected behavior with tag name filters, and poor performance leading to safety issues, such as difficult debugging, silent incorrect rewrites, and sharing of commands that may not work in different repositories or environments.