The Cost and Challenge of Regressions in Software Development

eliminated by an improved testing infrastructure that enables earlier and more effective identification and removal of software defects. These are the savings associated with finding an increased percentage (but not 100 percent) of errors closer to the development stages in which they are introduced. Currently, over half of all errors are not found until "downstream" in the development process or during post-sale software use. _____________ And then: _____________ Software developers already spend approximately 80 percent of development costs on identifying and correcting defects, and yet few products of any type other than software are shipped with such high levels of errors. _____________ Eventually the conclusion started with: _____________ The path to higher software quality is significantly improved software testing. _____________ There are other estimates saying that 80% of the cost related to software is about maintenance <<2>>. Though, according to Wikipedia <<3>>: _____________ A common perception of maintenance is that it is merely fixing bugs. However, studies and surveys over the years have indicated that the majority, over 80%, of the maintenance effort is used for non-corrective actions (Pigosky 1997). This perception is perpetuated by users submitting problem reports that in reality are functionality enhancements to the system. _____________ But we can guess that improving on existing software is very costly because you have to watch out for regressions. At least this would make the above studies consistent among themselves. Of course some kind of software is developed, then used during some time without being improved on much, and then finally thrown away. In this case, of course, regressions may not be a big problem. But on the other hand, there is a lot of big software that is continually developed and maintained during years or even tens of years by a lot of people. And as there are often many people who depend (sometimes critically) on such software, regressions are a really big problem. One such software is the Linux kernel. And if we look at the Linux kernel, we can see that a lot of time and effort is spent to fight regressions. The release cycle start with a 2 weeks long merge window. Then the first release candidate (rc) version is tagged. And after that about 7 or 8 more rc versions will appear with around one week between each of them, before the final release. The time between the first rc release and the final release is supposed to be used to test rc versions and fight bugs and especially regressions. And this time is more than 80% of the release cycle time. But this is not the end of the fight yet, as of course it continues after the release. And then this is what Ingo Molnar (a well known Linux kernel developer) says about his use of git bisect: _____________ I most actively use it during the merge window (when a lot of trees get merged upstream and when the influx of bugs is the highest) - and yes, there have been cases that i used it multiple times a day. My average is roughly once a day. _____________ So regressions are fought all the time by developers, and indeed it is well known that bugs should be fixed as soon as possible, so as soon as they are found. That's why it is interesting to have good tools for this purpose. Other tools to fight regressions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ So what are the tools used to fight regressions? They are nearly the same as those used to fight regular bugs. The only specific tools are test suites and tools similar as "git bisect". Test suites are very nice. But when they are used alone, they are supposed to be used so that all the tests are checked after each commit. This means that they are not very efficient, because many tests are run for no interesting result, and they suffer from combinatorial explosion. In fact the problem is that big software often has many different configuration options and that each test case should pass for each

This text discusses the significant cost and impact of regressions in software development, citing studies and examples from the Linux kernel, and highlights the importance of having effective tools, such as 'git bisect' and test suites, to identify and fix regressions efficiently.