Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

From: Dmitry Vyukov
Date: Thu Jan 18 2018 - 08:07:15 EST


On Thu, Jan 18, 2018 at 2:01 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o <tytso@xxxxxxx> wrote:
>> On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
>>>
>>> If syzkaller can only test one tree than linux-next should be the one.
>>
>> Well, there's been some controversy about that. The problem is that
>> it's often not clear if this is long-standing bug, or a bug which is
>> in a particular subsystem tree --- and if so, *which* subsystem tree,
>> etc. So it gets blasted to linux-kernel, and to get_maintainer.pl,
>> which is often not accurate --- since the location of the crash
>> doesn't necessarily point out where the problem originated, and hence
>> who should look at the syzbot report. And so this has caused
>> some.... irritation.
>
>
> Re set of tested trees.
>
> We now have an interesting spectrum of opinions.
>
> Some assorted thoughts on this:
>
> 1. First, "upstream is clean" won't happen any time soon. There are
> several reasons for this:
> - Currently syzkaller only tests a subset of subsystems that it knows
> how to test, even the ones that it tests it tests poorly. Over time
> it's improved to test most subsystems and existing subsystems better.
> Just few weeks ago I've added some descriptions for crypto subsystem
> and it uncovered 20+ old bugs.

/\/\/\/\/\/\/\/\/\/\/\/\

While we are here, you can help syzkaller to test your subsystem
better (or at all). It frequently requires domain expertise which we
don't have for all kernel subsystems (sometimes we don't even know
they exist). It can be as simple as this (for /dev/ashmem):
https://github.com/google/syzkaller/blob/master/sys/linux/ashmem.txt


> - syzkaller is guided, genetic fuzzer over time it leans how to do
> more complex things by small steps. It takes time.
> - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
> memory), KTSAN (data races).
> - generic syzkaller smartness will be improved over time.
> - it will get more CPU resources.
> Effect of all of these things is multiplicative: we test more code,
> smarter, with more bug-detection tools, with more resources. So I
> think we need to plan for a mix of old and new bugs for foreseeable
> future.
>
> 2. get_maintainer.pl and mix of old and new bugs was mentioned as
> harming attribution. I don't see what will change when/if we test only
> upstream. Then the same mix of old/new bugs will be detected just on
> upstream, with all of the same problems for old/new, maintainers,
> which subsystem, etc. I think the amount of bugs in the kernel is
> significant part of the problem, but the exact boundary where we
> decide to start killing them won't affect number of bugs.
>
> 3. If we test only upstream, we increase chances of new security bugs
> sinking into releases. We sure could raise perceived security value of
> the bugs by keeping them private, letting them sink into release,
> letting them sink into distros, and then reporting a high-profile
> vulnerability. I think that's wrong. There is something broken with
> value measuring in security community. Bug that is killed before
> sinking into any release is the highest impact thing. As Alexei noted,
> fixing bugs es early as possible also reduces fix costs, backporting
> burden, etc. This also can eliminate need in bisection in some cases,
> say if you accepted a large change to some files and a bunch of
> crashes appears for these files on your tree soon, it's obvious what
> happens.
>
> 4. It was mentioned that linux-next can have a broken slab allocator
> and that will manifest as multiple random crashes. FWIW I don't
> remember that I ever seen this. Yes, sometimes it does not build/boot,
> but these builds are just rejected for testing.
>
> I don't mind dropping linux-next specifically if that's the common
> decision. However, (1) Alexei and Gruenter expressed opposite opinion,
> (2) I don't see what it will change dramatically, (2) as far as I
> understand Linus actually relies on linux-next giving some concrete
> testing to the code there.
> But I think that testing bpf-next is a positive thing provided that
> there is explicit interest from maintainers. And note that that will
> be testing targeted specifically at bpf subsystem, so that instance
> will not generate bugs in SCSI, USB, etc (though it will cover a part
> of net). Also note that the latest email format includes set of tree
> where the crash happened, so if you see "upstream" or "upstream and
> bpf-next", nothing really changes, you still know that it happens
> upstream. Or if you see only "bpf-next", then you know that it's only
> that tree.