Home Explore Blog CI



git

1st chunk of `Documentation/git-bisect-lk2009.adoc`
618de3423643992c97790d184ef8f82a9555148aa17ef8d80000000100000fa1
Fighting regressions with git bisect
====================================
:Author: Christian Couder
:Email: chriscool@tuxfamily.org
:Date: 2009/11/08

Abstract
--------

"git bisect" enables software users and developers to easily find the
commit that introduced a regression. We show why it is important to
have good tools to fight regressions. We describe how "git bisect"
works from the outside and the algorithms it uses inside. Then we
explain how to take advantage of "git bisect" to improve current
practices. And we discuss how "git bisect" could improve in the
future.


Introduction to "git bisect"
----------------------------

Git is a Distributed Version Control system (DVCS) created by Linus
Torvalds and maintained by Junio Hamano.

In Git like in many other Version Control Systems (VCS), the different
states of the data that is managed by the system are called
commits. And, as VCS are mostly used to manage software source code,
sometimes "interesting" changes of behavior in the software are
introduced in some commits.

In fact people are specially interested in commits that introduce a
"bad" behavior, called a bug or a regression. They are interested in
these commits because a commit (hopefully) contains a very small set
of source code changes. And it's much easier to understand and
properly fix a problem when you only need to check a very small set of
changes, than when you don't know where look in the first place.

So to help people find commits that introduce a "bad" behavior, the
"git bisect" set of commands was invented. And it follows of course
that in "git bisect" parlance, commits where the "interesting
behavior" is present are called "bad" commits, while other commits are
called "good" commits. And a commit that introduce the behavior we are
interested in is called a "first bad commit". Note that there could be
more than one "first bad commit" in the commit space we are searching.

So "git bisect" is designed to help find a "first bad commit". And to
be as efficient as possible, it tries to perform a binary search.


Fighting regressions overview
-----------------------------

Regressions: a big problem
~~~~~~~~~~~~~~~~~~~~~~~~~~

Regressions are a big problem in the software industry. But it's
difficult to put some real numbers behind that claim.

There are some numbers about bugs in general, like a NIST study in
2002 <<1>> that said:

_____________
Software bugs, or errors, are so prevalent and so detrimental that
they cost the U.S. economy an estimated $59.5 billion annually, or
about 0.6 percent of the gross domestic product, according to a newly
released study commissioned by the Department of Commerce's National
Institute of Standards and Technology (NIST). At the national level,
over half of the costs are borne by software users and the remainder
by software developers/vendors.  The study also found that, although
all errors cannot be removed, more than a third of these costs, or an
estimated $22.2 billion, could be eliminated by an improved testing
infrastructure that enables earlier and more effective identification
and removal of software defects. These are the savings associated with
finding an increased percentage (but not 100 percent) of errors closer
to the development stages in which they are introduced. Currently,
over half of all errors are not found until "downstream" in the
development process or during post-sale software use.
_____________

And then:

_____________
Software developers already spend approximately 80 percent of
development costs on identifying and correcting defects, and yet few
products of any type other than software are shipped with such high
levels of errors.
_____________

Eventually the conclusion started with:

_____________
The path to higher software quality is significantly improved software
testing.
_____________

There are other estimates saying that 80% of the cost related to
software is about maintenance <<2>>.

Though, according to Wikipedia <<3>>:

_____________

Title: Fighting Regressions with Git Bisect
Summary
This text introduces the concept of regressions in software development and the importance of finding the commit that introduced a regression, highlighting the role of 'git bisect' in efficiently identifying such commits through a binary search process.