Home Explore Blog CI



git

7th chunk of `Documentation/git-bisect-lk2009.adoc`
bf2c7ef5b19ca0c02b12fc8f79ec22c380d819800ff649f70000000100000fac
 any commit graph, you can see the number associated with each
commit using "git rev-list --bisect-all".

For example, for the above graph, a command like:

-------------
$ git rev-list --bisect-all BAD --not GOOD1 GOOD2
-------------

would output something like:

-------------
e15b73ad3db9b48d7d1ade32f8cd23a751fe0ace (dist=3)
15722f2fa328eaba97022898a305ffc8172db6b1 (dist=2)
78e86cf3e850bd755bb71831f42e200626fbd1e0 (dist=2)
a1939d9a142de972094af4dde9a544e577ddef0e (dist=2)
070eab2303024706f2924822bfec8b9847e4ac1b (dist=1)
a3864d4f32a3bf5ed177ddef598490a08760b70d (dist=1)
a41baa717dd74f1180abf55e9341bc7a0bb9d556 (dist=1)
9e622a6dad403b71c40979743bb9d5be17b16bd6 (dist=0)
-------------

Bisection algorithm discussed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

First let's define "best bisection point". We will say that a commit X
is a best bisection point or a best bisection commit if knowing its
state ("good" or "bad") gives as much information as possible whether
the state of the commit happens to be "good" or "bad".

This means that the best bisection commits are the commits where the
following function is maximum:

-------------
f(X) = min(information_if_good(X), information_if_bad(X))
-------------

where information_if_good(X) is the information we get if X is good
and information_if_bad(X) is the information we get if X is bad.

Now we will suppose that there is only one "first bad commit". This
means that all its descendants are "bad" and all the other commits are
"good". And we will suppose that all commits have an equal probability
of being good or bad, or of being the first bad commit, so knowing the
state of c commits gives always the same amount of information
wherever these c commits are on the graph and whatever c is. (So we
suppose that these commits being for example on a branch or near a
good or a bad commit does not give more or less information).

Let's also suppose that we have a cleaned up graph like one after step
1) in the bisection algorithm above. This means that we can measure
   the information we get in terms of number of commit we can remove
   from the graph..

And let's take a commit X in the graph.

If X is found to be "good", then we know that its ancestors are all
"good", so we want to say that:

-------------
information_if_good(X) = number_of_ancestors(X)  (TRUE)
-------------

And this is true because at step 1) b) we remove the ancestors of the
"good" commits.

If X is found to be "bad", then we know that its descendants are all
"bad", so we want to say that:

-------------
information_if_bad(X) = number_of_descendants(X)  (WRONG)
-------------

But this is wrong because at step 1) a) we keep only the ancestors of
the bad commit. So we get more information when a commit is marked as
"bad", because we also know that the ancestors of the previous "bad"
commit that are not ancestors of the new "bad" commit are not the
first bad commit. We don't know if they are good or bad, but we know
that they are not the first bad commit because they are not ancestor
of the new "bad" commit.

So when a commit is marked as "bad" we know we can remove all the
commits in the graph except those that are ancestors of the new "bad"
commit. This means that:

-------------
information_if_bad(X) = N - number_of_ancestors(X)  (TRUE)
-------------

where N is the number of commits in the (cleaned up) graph.

So in the end this means that to find the best bisection commits we
should maximize the function:

-------------
f(X) = min(number_of_ancestors(X), N - number_of_ancestors(X))
-------------

And this is nice because at step 2) we compute number_of_ancestors(X)
and so at step 3) we compute f(X).

Let's take the following graph as an example:

-------------
            G-H-I-J
           /       \
A-B-C-D-E-F         O
           \       /
            K-L-M-N
-------------

If we compute the following non optimal function on it:

-------------
g(X) = min(number_of_ancestors(X), number_of_descendants(X))
-------------

we get:

-------------

Title: Understanding the Git Bisect Algorithm
Summary
This text explains the Git bisect algorithm, including how it defines the 'best bisection point' and calculates the information gained from knowing the state of a commit, using a function that maximizes the minimum of the number of ancestors and the number of commits that can be removed if the commit is bad, and provides examples to illustrate the concept.