Re: bug-introducing patches (or: -rc cycles suck)

From: Sasha Levin
Date: Tue May 01 2018 - 13:21:38 EST


On Tue, May 01, 2018 at 06:50:51PM +0200, Willy Tarreau wrote:
>On Tue, May 01, 2018 at 04:19:35PM +0000, Sasha Levin wrote:
>> On Mon, Apr 30, 2018 at 09:09:18PM +0200, Willy Tarreau wrote:
>> >Hi Sasha,
>> >
>> >On Mon, Apr 30, 2018 at 05:58:30PM +0000, Sasha Levin wrote:
>> >> - For some reason, the odds of a -rc commit to be targetted for -stable is
>> >> over 20%, while for merge window commits it's about 3%. I can't quite
>> >> explain why that happens, but this would suggest that -rc commits end up
>> >> hurting -stable pretty badly.
>> >
>> >Often, merge window collects work that has been done during the previous
>> >cycle and which is prepared to target this merge window. Fixes that happen
>> >during this period very likely tend to either be remerged with the patches
>> >before they are submitted if they concern the code to be submitted, or are
>> >delayed to after the work gets merged. As a result few of the pre-rc1 patches
>> >get backported while the next ones mostly contain fixes. By the way, you
>> >probably also noticed it when backporting patches to your stable releases,
>> >the mainline commit almost never comes from a merge window.
>>
>> I'm not sure I understand/agree with this explanation. You're saying
>> that commits that fix issues in newly introduced features got folded in
>> the feature before it was sent during the merge window, so then there
>> was no need for them to be tagged for stable?
>
>No, what I mean is that often a developer is either in development mode
>or in bug-fixing mode but it's often quite hard to quickly switch between
>the two. So when you're finishing what you were doing to meet the merge
>window deadline and you receive bug fixes, it's natural to hold on a few
>fixes because it's hard to switch to the review mode. However, if some
>fixes concern the code you're about to submit, it's not bug fixing but
>fixes for your development in progress and that doesn't require as much
>effort, so these updates can often be remerged before being submitted.

I see. But then, wouldn't there be a spike of -stable patches for -rc1
and -rc2?

[snip]
>> It also appears that pretty much the same ratio of commits are tagged
>> for -stable accross all -rc cycles, so there are no spikes at any point
>> during the cycle, which seems to suggest that there is no particular
>> relationship between when a -stable commit is created to the stage in a
>> release cycle of the current kernel.
>
>Not much surprising to me. After all, -rc are "let's observe and fix",
>and it's expected that bugs are randomly met and fixed during that
>period.

So for bugs found and fixed during -rc6+, those fixes should be merged
around the next merge window (time for reviews + time to bake in
stable), but this doesn't seem to happen. Maybe the lack of -stable
commits during merge windows is a symptom of the problem?

>> >I think that you'll also notice that fixes that address bugs introduced
>> >during the merge window of the same version will more often introduce
>> >bugs than the ones which address 6-months old bugs which require some
>> >deeper thinking. In short it indicates that we tend to believe we are
>> >better than we really are, especially very late at night.
>>
>> I very much agree. I also think that "upper-level" maintainers, and
>> Linus in particular have to stop this behavior.
>
>Well it's easier said than done. You don't really choose when you can
>become creative or efficient. For some people it's when everyone else
>is asleep, for others it's when they can have 8 uninterrupted hours
>in front of them to work on a complex bug. I think it's more efficient
>to let people be aware of their limits than to try to overcome them.
>The typical thought "I'm too stupid now, let's go to bed" followed the
>next morning with a review starting to think "what did I break last
>night" is already quite profitable provided people are humble enough
>to think like this.

I'm not saying that patches should be rejected, they should just be told
to spend more time in -next gathering reviews and tests.

Linus would basically say "resend this once the patch has been in -next
for 21 days". That's all.

Heck, we could automate this and check pull requests send to Linus and
warn about "new" patches.

>> Yes, folks who do these
>> patches are often very familiar with the subsystem, but this doesn't
>> mean that they don't make mistakes.
>
>But we all do mistakes all the time. And quite frankly I find that the
>recent kernels quality in the early stages after the release is much
>better than what it used to be. Kernels build fine, boot fine on most
>hardware, and after a few stable versions you can really start to
>forget to update them because you don't meet the crashes anymore. Just
>a simple example (please don't reproduce, I'm not proud of it), when
>I replaced my PC, it came with 4.4.6. I thought "I'll have to upgrade
>next week". But I had so many trouble with its crappy bogus BIOS that
>I was afraid to reboot it. Then I had hundreds of xterms spread over
>multiple displays and it was never the best moment to reboot. Finally
>it happened 550 days later. Yes, the 6th maintenance release of 4.4
>lasted 550 days on a developers machine doing all sort of stuff without
>even a scary message in dmesg. Of course in terms of security it's
>terrible. But we didn't see this level of stability in 2.6.x nor in
>the early 3.x versions.
>
>> It's as if during -rc cycles all rules are void and bug fixes are now
>> no be collected and merged in as fast as humanly possible without any
>> regard to how well these fixes were tested.
>
>These stages are supposed to serve to collect fixes, and fixes are
>supposed to be tested. Often it's worse to let a fix rot somewhere
>than to get it. At the very least by merging it you expose it more
>quickly and you have more chances to know if you missed anything.

Linus's tree isn't a testing tree anymore. In reality, it's just a
cadence/sync point for kernel devs.

Integration and testing happen in -next. The various bots we have run on
-next, most folks are doing their custom testing on -next (we do...).

Linus's tree is was "demoted" as a result of the significant improvement
in testing automation and the capacity to be able to test a fast
changing tree such as -next.

So keeping patches out of Linus's tree isn't really equals to "letting
them rot", it just means "let them get more testing".

>I remember in the past some people arguing that we shouldn't backport
>fixes that haven't experienced a release yet, but that would make the
>situation even worse, with no stable fix for the 3 months following a
>release. The overall amount of reverts in stable kernels remains very
>low, which indicates to me that the overall quality is quite good,
>eventhough the process causes gray hair to the involved people (well
>for those still having hair).

Right, the statistics didn't support the policy change. The -stable
kernel is better at not introducing bugs because (IMO) the commits get
even more reviews.

Countless times my requests for reviews of -stable commits have
uncovered a bug in mainline.

>That's overall why I think that your work can be useful to raise
>awareness of what behaviours decrease the level of quality so that
>everyone can try to improve a bit, but I don't think there is that
>much to squeeze without hitting the wall of what a human brain can
>reasonably deal with. And extra process is a mental pressure just
>like dealing with bugs, so comes a point where process competes
>with quality.

I'm trying to come up with a way where, similar to AUTOSEL, humans won't
need to do much more work.

I'm also not advocating for *more* process, I'm advocating for a
*different* process.

Linus, as he already stated himself, is looking at how long a patch
spent in -next before he pulls it. I'm suggesting to improve and build
on that. Look at how long a patch was in -next, how many people reviewed
it, how much mailing list discussion it triggered, etc.

What I want to end up with is a tool to make maintainer's life easier by
highlighting "dangerous" patches that require more careful review. It's
much more time efficient to keep bugs out than deal with them later.