Home Explore Blog CI



git

3rd chunk of `Documentation/gitdiffcore.adoc`
363cfbcc15fef26a3801c9a3b10e39a4bf7a07a5e5a8662a0000000100000fa2
 resulting contents of
file0 are compared, and if they are similar enough, they are
changed to:

------------------------------------------------
:100644 100644 0123456... 1234567... M fileY
:100644 100644 0123456... bcd3456... C100 fileY file0
------------------------------------------------

In both rename and copy detection, the same "extent of changes"
algorithm used in diffcore-break is used to determine if two
files are "similar enough", and can be customized to use
a similarity score different from the default of 50% by giving a
number after the "-M" or "-C" option (e.g. "-M8" to tell it to use
8/10 = 80%).

Note that when rename detection is on but both copy and break
detection are off, rename detection adds a preliminary step that first
checks if files are moved across directories while keeping their
filename the same.  If there is a file added to a directory whose
contents are sufficiently similar to a file with the same name that got
deleted from a different directory, it will mark them as renames and
exclude them from the later quadratic step (the one that pairwise
compares all unmatched files to find the "best" matches, determined by
the highest content similarity).  So, for example, if a deleted
docs/ext.txt and an added docs/config/ext.txt are similar enough, they
will be marked as a rename and prevent an added docs/ext.md that may
be even more similar to the deleted docs/ext.txt from being considered
as the rename destination in the later step.  For this reason, the
preliminary "match same filename" step uses a bit higher threshold to
mark a file pair as a rename and stop considering other candidates for
better matches.  At most, one comparison is done per file in this
preliminary pass; so if there are several remaining ext.txt files
throughout the directory hierarchy after exact rename detection, this
preliminary step may be skipped for those files.

Note.  When the "-C" option is used with `--find-copies-harder`
option, 'git diff-{asterisk}' commands feed unmodified filepairs to
diffcore mechanism as well as modified ones.  This lets the copy
detector consider unmodified files as copy source candidates at
the expense of making it slower.  Without `--find-copies-harder`,
'git diff-{asterisk}' commands can detect copies only if the file that was
copied happened to have been modified in the same changeset.


diffcore-merge-broken: For Putting Complete Rewrites Back Together
------------------------------------------------------------------

This transformation is used to merge filepairs broken by
diffcore-break, and not transformed into rename/copy by
diffcore-rename, back into a single modification.  This always
runs when diffcore-break is used.

For the purpose of merging broken filepairs back, it uses a
different "extent of changes" computation from the ones used by
diffcore-break and diffcore-rename.  It counts only the deletion
from the original, and does not count insertion.  If you removed
only 10 lines from a 100-line document, even if you added 910
new lines to make a new 1000-line document, you did not do a
complete rewrite.  diffcore-break breaks such a case in order to
help diffcore-rename to consider such filepairs as a candidate of
rename/copy detection, but if filepairs broken that way were not
matched with other filepairs to create rename/copy, then this
transformation merges them back into the original
"modification".

The "extent of changes" parameter can be tweaked from the
default 80% (that is, unless more than 80% of the original
material is deleted, the broken pairs are merged back into a
single modification) by giving a second number to -B option,
like these:

* -B50/60 (give 50% "break score" to diffcore-break, use 60%
  for diffcore-merge-broken).

* -B/60 (the same as above, since diffcore-break defaults to 50%).

Note that earlier implementation left a broken pair as separate
creation and deletion patches.  This was an unnecessary hack, and
the latest implementation always merges all the

Title: Advanced Git Diffcore Transformations
Summary
Git diffcore transformations, including rename and copy detection, can be customized using options like -M and -C, and involve preliminary steps to check for file movements and similarities, with subsequent transformations like diffcore-merge-broken merging broken filepairs back into single modifications based on extent of changes computations.