Home Explore Blog CI



git

9th chunk of `Documentation/git-filter-branch.adoc`
5463622026071e370bb8992eb327a0b72bd0670119f7adbd0000000100000da9
 flags are not relevant
  because someone else renamed any such files in their repo back
  before the person doing the filtering joined the project.  And
  often, even those familiar with handling arguments with spaces may
  not do so just because they aren't in the mindset of thinking about
  everything that could possibly go wrong.

* Non-ascii filenames can be silently removed despite being in a
  desired directory.  Keeping only wanted paths is often done using
  pipelines like `git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`.
  ls-files will only quote filenames if needed, so folks may not
  notice that one of the files didn't match the regex (at least not
  until it's much too late).  Yes, someone who knows about
  core.quotePath can avoid this (unless they have other special
  characters like \t, \n, or "), and people who use ls-files -z with
  something other than grep can avoid this, but that doesn't mean they
  will.

* Similarly, when moving files around, one can find that filenames
  with non-ascii or special characters end up in a different
  directory, one that includes a double quote character.  (This is
  technically the same issue as above with quoting, but perhaps an
  interesting different way that it can and has manifested as a
  problem.)

* It's far too easy to accidentally mix up old and new history.  It's
  still possible with any tool, but git-filter-branch almost
  invites it.  If lucky, the only downside is users getting frustrated
  that they don't know how to shrink their repo and remove the old
  stuff.  If unlucky, they merge old and new history and end up with
  multiple "copies" of each commit, some of which have unwanted or
  sensitive files and others which don't.  This comes about in
  multiple different ways:

  ** the default to only doing a partial history rewrite ('--all' is not
     the default and few examples show it)

  ** the fact that there's no automatic post-run cleanup

  ** the fact that --tag-name-filter (when used to rename tags) doesn't
     remove the old tags but just adds new ones with the new name

  ** the fact that little educational information is provided to inform
     users of the ramifications of a rewrite and how to avoid mixing old
     and new history.  For example, this man page discusses how users
     need to understand that they need to rebase their changes for all
     their branches on top of new history (or delete and reclone), but
     that's only one of multiple concerns to consider.  See the
     "DISCUSSION" section of the git filter-repo manual page for more
     details.

* Annotated tags can be accidentally converted to lightweight tags,
  due to either of two issues:

  ** Someone can do a history rewrite, realize they messed up, restore
     from the backups in refs/original/, and then redo their
     git-filter-branch command.  (The backup in refs/original/ is not a
     real backup; it dereferences tags first.)

  ** Running git-filter-branch with either --tags or --all in your
     <rev-list-options>.  In order to retain annotated tags as
     annotated, you must use --tag-name-filter (and must not have
     restored from refs/original/ in a previously botched rewrite).

* Any commit messages that specify an encoding will become corrupted
  by the rewrite; git-filter-branch ignores the encoding, takes the
  original bytes, and feeds it to commit-tree without telling it the
  proper encoding.  (This happens whether or not --msg-filter is
  used.)

Title: Git Filter Branch Safety Issues
Summary
The text outlines several safety concerns with using git filter-branch, including the potential for silently removing non-ASCII filenames, accidentally mixing up old and new history, converting annotated tags to lightweight tags, and corrupting commit messages with specified encodings, highlighting the need for caution and careful consideration when using this tool.