Home Explore Blog CI



git

8th chunk of `Documentation/git-bisect-lk2009.adoc`
92ec9b5d46abb8c6fe20860350ba1da608518eb35b19f5dd0000000100000fa1
 are not ancestor
of the new "bad" commit.

So when a commit is marked as "bad" we know we can remove all the
commits in the graph except those that are ancestors of the new "bad"
commit. This means that:

-------------
information_if_bad(X) = N - number_of_ancestors(X)  (TRUE)
-------------

where N is the number of commits in the (cleaned up) graph.

So in the end this means that to find the best bisection commits we
should maximize the function:

-------------
f(X) = min(number_of_ancestors(X), N - number_of_ancestors(X))
-------------

And this is nice because at step 2) we compute number_of_ancestors(X)
and so at step 3) we compute f(X).

Let's take the following graph as an example:

-------------
            G-H-I-J
           /       \
A-B-C-D-E-F         O
           \       /
            K-L-M-N
-------------

If we compute the following non optimal function on it:

-------------
g(X) = min(number_of_ancestors(X), number_of_descendants(X))
-------------

we get:

-------------
            4 3 2 1
            G-H-I-J
1 2 3 4 5 6/       \0
A-B-C-D-E-F         O
           \       /
            K-L-M-N
            4 3 2 1
-------------

but with the algorithm used by git bisect we get:

-------------
            7 7 6 5
            G-H-I-J
1 2 3 4 5 6/       \0
A-B-C-D-E-F         O
           \       /
            K-L-M-N
            7 7 6 5
-------------

So we chose G, H, K or L as the best bisection point, which is better
than F. Because if for example L is bad, then we will know not only
that L, M and N are bad but also that G, H, I and J are not the first
bad commit (since we suppose that there is only one first bad commit
and it must be an ancestor of L).

So the current algorithm seems to be the best possible given what we
initially supposed.

Skip algorithm
~~~~~~~~~~~~~~

When some commits have been skipped (using "git bisect skip"), then
the bisection algorithm is the same for step 1) to 3). But then we use
roughly the following steps:

6) sort the commit by decreasing associated value

7) if the first commit has not been skipped, we can return it and stop
   here

8) otherwise filter out all the skipped commits in the sorted list

9) use a pseudo random number generator (PRNG) to generate a random
   number between 0 and 1

10) multiply this random number with its square root to bias it toward
    0

11) multiply the result by the number of commits in the filtered list
    to get an index into this list

12) return the commit at the computed index

Skip algorithm discussed
~~~~~~~~~~~~~~~~~~~~~~~~

After step 7) (in the skip algorithm), we could check if the second
commit has been skipped and return it if it is not the case. And in
fact that was the algorithm we used from when "git bisect skip" was
developed in Git version 1.5.4 (released on February 1st 2008) until
Git version 1.6.4 (released July 29th 2009).

But Ingo Molnar and H. Peter Anvin (another well known linux kernel
developer) both complained that sometimes the best bisection points
all happened to be in an area where all the commits are
untestable. And in this case the user was asked to test many
untestable commits, which could be very inefficient.

Indeed untestable commits are often untestable because a breakage was
introduced at one time, and that breakage was fixed only after many
other commits were introduced.

This breakage is of course most of the time unrelated to the breakage
we are trying to locate in the commit graph. But it prevents us to
know if the interesting "bad behavior" is present or not.

So it is a fact that commits near an untestable commit have a high
probability of being untestable themselves. And the best bisection
commits are often found together too (due to the bisection algorithm).

This is why it is a bad idea to just chose the next best unskipped
bisection commit when the first one has been skipped.

We found that most commits on the graph may give quite a lot of
information when they are tested. And the commits that

Title: Git Bisect Algorithm with Skipped Commits
Summary
This text describes how the Git bisect algorithm works when some commits have been skipped using 'git bisect skip', including the modifications to the algorithm to handle skipped commits, and discusses the reasoning behind the changes made to the algorithm to improve its efficiency and reduce the number of untestable commits that need to be tested.