Advanced Git Bisect Algorithm

bisection points all happened to be in an area where all the commits are untestable. And in this case the user was asked to test many untestable commits, which could be very inefficient. Indeed untestable commits are often untestable because a breakage was introduced at one time, and that breakage was fixed only after many other commits were introduced. This breakage is of course most of the time unrelated to the breakage we are trying to locate in the commit graph. But it prevents us to know if the interesting "bad behavior" is present or not. So it is a fact that commits near an untestable commit have a high probability of being untestable themselves. And the best bisection commits are often found together too (due to the bisection algorithm). This is why it is a bad idea to just chose the next best unskipped bisection commit when the first one has been skipped. We found that most commits on the graph may give quite a lot of information when they are tested. And the commits that will not on average give a lot of information are the one near the good and bad commits. So using a PRNG with a bias to favor commits away from the good and bad commits looked like a good choice. One obvious improvement to this algorithm would be to look for a commit that has an associated value near the one of the best bisection commit, and that is on another branch, before using the PRNG. Because if such a commit exists, then it is not very likely to be untestable too, so it will probably give more information than a nearly randomly chosen one. Checking merge bases ~~~~~~~~~~~~~~~~~~~~ There is another tweak in the bisection algorithm that has not been described in the "bisection algorithm" above. We supposed in the previous examples that the "good" commits were ancestors of the "bad" commit. But this is not a requirement of "git bisect". Of course the "bad" commit cannot be an ancestor of a "good" commit, because the ancestors of the good commits are supposed to be "good". And all the "good" commits must be related to the bad commit. They cannot be on a branch that has no link with the branch of the "bad" commit. But it is possible for a good commit to be related to a bad commit and yet not be neither one of its ancestor nor one of its descendants. For example, there can be a "main" branch, and a "dev" branch that was forked of the main branch at a commit named "D" like this: ------------- A-B-C-D-E-F-G <--main \ H-I-J <--dev ------------- The commit "D" is called a "merge base" for branch "main" and "dev" because it's the best common ancestor for these branches for a merge. Now let's suppose that commit J is bad and commit G is good and that we apply the bisection algorithm like it has been previously described. As described in step 1) b) of the bisection algorithm, we remove all the ancestors of the good commits because they are supposed to be good too. So we would be left with only: ------------- H-I-J ------------- But what happens if the first bad commit is "B" and if it has been fixed in the "main" branch by commit "F"? The result of such a bisection would be that we would find that H is the first bad commit, when in fact it's B. So that would be wrong! And yes it can happen in practice that people working on one branch are not aware that people working on another branch fixed a bug! It could also happen that F fixed more than one bug or that it is a revert of some big development effort that was not ready to be released. In fact development teams often maintain both a development branch and a maintenance branch, and it would be quite easy for them if "git bisect" just worked when they want to bisect a regression on the development branch that is not on the maintenance branch. They should be able to start bisecting using: ------------- $ git bisect start dev main ------------- To enable that additional nice feature, when a bisection is started and when some good commits are not ancestors of the bad commit,

This text discusses the limitations of the standard Git bisect algorithm when dealing with untestable commits and multiple branches, and describes additional tweaks and improvements to the algorithm, including the use of a pseudo-random number generator to favor commits away from good and bad commits, and the consideration of merge bases to handle cases where good commits are not ancestors of bad commits, allowing for more accurate and efficient bisecting in complex branch scenarios.