Understanding the Git Bisect Algorithm

any commit graph, you can see the number associated with each commit using "git rev-list --bisect-all". For example, for the above graph, a command like: ------------- $ git rev-list --bisect-all BAD --not GOOD1 GOOD2 ------------- would output something like: ------------- e15b73ad3db9b48d7d1ade32f8cd23a751fe0ace (dist=3) 15722f2fa328eaba97022898a305ffc8172db6b1 (dist=2) 78e86cf3e850bd755bb71831f42e200626fbd1e0 (dist=2) a1939d9a142de972094af4dde9a544e577ddef0e (dist=2) 070eab2303024706f2924822bfec8b9847e4ac1b (dist=1) a3864d4f32a3bf5ed177ddef598490a08760b70d (dist=1) a41baa717dd74f1180abf55e9341bc7a0bb9d556 (dist=1) 9e622a6dad403b71c40979743bb9d5be17b16bd6 (dist=0) ------------- Bisection algorithm discussed ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ First let's define "best bisection point". We will say that a commit X is a best bisection point or a best bisection commit if knowing its state ("good" or "bad") gives as much information as possible whether the state of the commit happens to be "good" or "bad". This means that the best bisection commits are the commits where the following function is maximum: ------------- f(X) = min(information_if_good(X), information_if_bad(X)) ------------- where information_if_good(X) is the information we get if X is good and information_if_bad(X) is the information we get if X is bad. Now we will suppose that there is only one "first bad commit". This means that all its descendants are "bad" and all the other commits are "good". And we will suppose that all commits have an equal probability of being good or bad, or of being the first bad commit, so knowing the state of c commits gives always the same amount of information wherever these c commits are on the graph and whatever c is. (So we suppose that these commits being for example on a branch or near a good or a bad commit does not give more or less information). Let's also suppose that we have a cleaned up graph like one after step 1) in the bisection algorithm above. This means that we can measure the information we get in terms of number of commit we can remove from the graph.. And let's take a commit X in the graph. If X is found to be "good", then we know that its ancestors are all "good", so we want to say that: ------------- information_if_good(X) = number_of_ancestors(X) (TRUE) ------------- And this is true because at step 1) b) we remove the ancestors of the "good" commits. If X is found to be "bad", then we know that its descendants are all "bad", so we want to say that: ------------- information_if_bad(X) = number_of_descendants(X) (WRONG) ------------- But this is wrong because at step 1) a) we keep only the ancestors of the bad commit. So we get more information when a commit is marked as "bad", because we also know that the ancestors of the previous "bad" commit that are not ancestors of the new "bad" commit are not the first bad commit. We don't know if they are good or bad, but we know that they are not the first bad commit because they are not ancestor of the new "bad" commit. So when a commit is marked as "bad" we know we can remove all the commits in the graph except those that are ancestors of the new "bad" commit. This means that: ------------- information_if_bad(X) = N - number_of_ancestors(X) (TRUE) ------------- where N is the number of commits in the (cleaned up) graph. So in the end this means that to find the best bisection commits we should maximize the function: ------------- f(X) = min(number_of_ancestors(X), N - number_of_ancestors(X)) ------------- And this is nice because at step 2) we compute number_of_ancestors(X) and so at step 3) we compute f(X). Let's take the following graph as an example: ------------- G-H-I-J / \ A-B-C-D-E-F O \ / K-L-M-N ------------- If we compute the following non optimal function on it: ------------- g(X) = min(number_of_ancestors(X), number_of_descendants(X)) ------------- we get: -------------

This text explains the Git bisect algorithm, including how it defines the 'best bisection point' and calculates the information gained from knowing the state of a commit, using a function that maximizes the minimum of the number of ancestors and the number of commits that can be removed if the commit is bad, and provides examples to illustrate the concept.