Re: [Ksummit-discuss] bug-introducing patches

From: Willy Tarreau
Date: Thu May 03 2018 - 11:57:19 EST


On Thu, May 03, 2018 at 08:27:48AM -0700, James Bottomley wrote:
> It's also a sad fact that a lot of things which look like obvious fixes
> actually turn out not to be so with later testing. This is why the
> user visibility test is paramount. If a bug fix has no real user
> visible effects, it's often better to defer it no matter how obvious it
> looks, which is why the static code checkers often get short shrift
> before a merge window.
>
> A script measuring user visibility would be nice, but looks a bit
> complex ...

I totally agree with this and it matches my experience in haproxy. We
have had series of fixes that broke something else in very subtle ways
that made us want to improve non-reg, but many of the times we noted
that reg testing would hardly spot them given that the failures require
so many conditions to happen only once every million that it's hopeless.
It's just that some users are (un)lucky enough to meet all the conditions
at once very often and to be very sensitive to one error per million.

User exposure is needed. Having multiple stable release ensures everyone
gets their expected level of trust. Those on -rc want to see bugs before
they hit their users. Regressions are bad and require self-moderation and
self-estimation of the amount of trust in one's code, but they're better
in -rc than in -stable. I do happen to write some fixes I'm not totally
sure about and prefer not to backport them immediately. Users value
transparency because that helps them take safe decisions. If I say "this
is my fix, but I'd love more testing as I'm not yet sold on it", I'll get
some testers, but not the ones complaining that I broke their setup. Only
later it makes sense to progressively backport.

I have broken stable releases many times with failed backports. Almost
every time it was my fault due to incomplete testing. I could argue that
once you've built one hundred times in a week-end you're probably a bit
more lenient about next builds, or whatever. But in the end I was the
one breaking a working version. Seeing my branches picked up by Guenter
was a huge relief and it started to spot many build issues that I could
not figure myself. It doesn't make remaining bugs less important but at
least they are easier to swallow, to spot and to address.

What's not acceptable is rushed fixes that have obvious side effects that
could have been caught by closer analysis or better testing. It always
happens but it must not happen too often for the same person/subsystem.
This I think is where the line must be drawn. When Linus shouts once in
a while, it's a reminder for all others. Tune the potentiometer of his
detection threshold a bit lower and we'll get less regressions because
it's never pleasant to be called stupid in public like he does.

Willy