Git Bisect Algorithm with Skipped Commits

are not ancestor of the new "bad" commit. So when a commit is marked as "bad" we know we can remove all the commits in the graph except those that are ancestors of the new "bad" commit. This means that: ------------- information_if_bad(X) = N - number_of_ancestors(X) (TRUE) ------------- where N is the number of commits in the (cleaned up) graph. So in the end this means that to find the best bisection commits we should maximize the function: ------------- f(X) = min(number_of_ancestors(X), N - number_of_ancestors(X)) ------------- And this is nice because at step 2) we compute number_of_ancestors(X) and so at step 3) we compute f(X). Let's take the following graph as an example: ------------- G-H-I-J / \ A-B-C-D-E-F O \ / K-L-M-N ------------- If we compute the following non optimal function on it: ------------- g(X) = min(number_of_ancestors(X), number_of_descendants(X)) ------------- we get: ------------- 4 3 2 1 G-H-I-J 1 2 3 4 5 6/ \0 A-B-C-D-E-F O \ / K-L-M-N 4 3 2 1 ------------- but with the algorithm used by git bisect we get: ------------- 7 7 6 5 G-H-I-J 1 2 3 4 5 6/ \0 A-B-C-D-E-F O \ / K-L-M-N 7 7 6 5 ------------- So we chose G, H, K or L as the best bisection point, which is better than F. Because if for example L is bad, then we will know not only that L, M and N are bad but also that G, H, I and J are not the first bad commit (since we suppose that there is only one first bad commit and it must be an ancestor of L). So the current algorithm seems to be the best possible given what we initially supposed. Skip algorithm ~~~~~~~~~~~~~~ When some commits have been skipped (using "git bisect skip"), then the bisection algorithm is the same for step 1) to 3). But then we use roughly the following steps: 6) sort the commit by decreasing associated value 7) if the first commit has not been skipped, we can return it and stop here 8) otherwise filter out all the skipped commits in the sorted list 9) use a pseudo random number generator (PRNG) to generate a random number between 0 and 1 10) multiply this random number with its square root to bias it toward 0 11) multiply the result by the number of commits in the filtered list to get an index into this list 12) return the commit at the computed index Skip algorithm discussed ~~~~~~~~~~~~~~~~~~~~~~~~ After step 7) (in the skip algorithm), we could check if the second commit has been skipped and return it if it is not the case. And in fact that was the algorithm we used from when "git bisect skip" was developed in Git version 1.5.4 (released on February 1st 2008) until Git version 1.6.4 (released July 29th 2009). But Ingo Molnar and H. Peter Anvin (another well known linux kernel developer) both complained that sometimes the best bisection points all happened to be in an area where all the commits are untestable. And in this case the user was asked to test many untestable commits, which could be very inefficient. Indeed untestable commits are often untestable because a breakage was introduced at one time, and that breakage was fixed only after many other commits were introduced. This breakage is of course most of the time unrelated to the breakage we are trying to locate in the commit graph. But it prevents us to know if the interesting "bad behavior" is present or not. So it is a fact that commits near an untestable commit have a high probability of being untestable themselves. And the best bisection commits are often found together too (due to the bisection algorithm). This is why it is a bad idea to just chose the next best unskipped bisection commit when the first one has been skipped. We found that most commits on the graph may give quite a lot of information when they are tested. And the commits that

This text describes how the Git bisect algorithm works when some commits have been skipped using 'git bisect skip', including the modifications to the algorithm to handle skipped commits, and discusses the reasoning behind the changes made to the algorithm to improve its efficiency and reduce the number of untestable commits that need to be tested.