Re: INFO: rcu detected stall in sys_sendfile64 (2)

From: Dmitry Vyukov
Date: Wed Mar 20 2019 - 09:45:36 EST


On Thu, Mar 14, 2019 at 11:52 AM Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On 2019/03/14 8:40, Eric Biggers wrote:
> > On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> >>> Also, humans can sometimes find more simpler C reproducers from syzbot provided
> >>> reproducers. It would be nice if syzbot can accept and use a user defined C
> >>> reproducer for testing.
> >>
> >> It would be more useful to accept patches that make syzkaller create
> >> better reproducers from these people. Manual work is not scalable. We
> >> would need 10 reproducers per day for a dozen of OSes (incl some
> >> private kernels/branches). Anybody is free to run syzkaller manually
> >> and do full manual (perfect) reporting. But for us it become clear
> >> very early that it won't work. Then see above, while that human is
> >> sleeping/on weekend/vacation, syzbot will already bisect own
> >> reproducer. Adding manual reproducer later won't help in any way.
> >> syzkaller already does lots of smart work for reproducers. Let's not
> >> give up on the last mile and switch back to all manual work.
> >>
> >
> > Well, it's very tough and not many people are familiar with the syzkaller
> > codebase, let alone have time to contribute.
>
> Right. I don't read/write go programs. I don't have access to environments
> for running syzbot. But instead I try to write kernel patches.
>
> Also, although anybody is free to do full manual (perfect) reporting,
> I can't afford checking such reports posted to e.g. LKML. I can afford
> checking only https://syzkaller.appspot.com/ .
>
> I have seen a Japanese article which explains how to run syzbot. But I felt that
> that article lacks what to do if syzbot found a bug. If people found a crash
> by running syzbot in their environments, it would be nice if they can export
> the report and import it to https://syzkaller.appspot.com/ (i.e. dashboard
> acts as if a bugzilla).


Problem 1 (smaller). Neither providing custom program nor manually
specifying bisection range (as you suggested in another thread
https://groups.google.com/d/msg/syzkaller-bugs/nFeC8-UG1gg/1OTVIuzBAgAJ)
won't make kernel bug bisection reliable. The problems with kernel
bisection are deeper. Consider a bug that is inherently hard to
trigger, even if one provides own reproducer it's still hard to
trigger and bisection can diverge. What happened in the other bug:
bisection diverged because the reproducer triggered another bug. Now
consider that this happens within the bisection range. Even if you
give own range, it won't help. And there are lots of other problems
like, say, large ranges where kernel build is broken.
And this will introduce own problems: e.g. it's very easy to give
syzbot a reproducer that actually don't not trigger the bug for it
(because you can't match its environment precisely).
Also: if you can't bisect locally and can't test, how do you know the
right range generally? Again that one bug was a single corner case.
Also: semi-manual process will also lead to some suboptimal results,
and then other kernel developers will come and ask questions and
somebody will need to answer these questions. But in this case syzbot
is not even accountable for what happened.
I don't think there is a simple substitution for a qualified engineer
doing its job (guiding each step of bisection manually).
It's possible to imagine a very complex workflow (super hard to
implement, test and maintain too) that will allow to do that. And it
becomes mostly offloading build/boot/test of a given configuration to
the cloud. And this brings this us to the second problem.

Problem 2. What you are proposing effectively looks like some kind of
custom workload offloading service for kernel developers. Just instead
of console commands (raw cloud VMs) it has somewhat higher level
interface (e.g. here is kernel config, compiler, command line,
sysctls, machine configuration and test case, go build and test it).
I don't think this should be bolted on top of syzbot.
Developing and running syzbot is already a _huge_ amount of work
(frequently ungrateful). I simply cannot take on developing, testing,
deploying, maintaining and operating another service. And that service
will involve much more complex human interactions, so will be much
more complex overall.
If such service is provided I think it needs to run on Linux
Foundation infrastructure that runs CI and other testing. Yes, I know,
it does not exist. But that would be the right place. It would benefit
work on all other kernel bugs too. Lots of things people attribute to
syzbot are really not specific to syzbot in anyway. For example that
service would help with bisection of all other bugs too. And it seems
that a much simpler solution would be just to provide free VMs for
developers, because you main points seems to be "I would like to do
something custom, but I don't have resources for that". This is out of
scope for syzbot.
The current syzbot scope is: automating as much as possible, solving
common cases at scale (including other OSes and kernel branches),
bringing developers enough information to pick up the bug from there
and do any custom work necessary to debug and fix the bug (there
always will be custom work! even perfect bisection can get you nowhere
re root causing and there are still bugs without reproducers). We can
solve some surrounding problems too _iff_ they are common enough, have
high bang for the buck, reasonably easy to implement and don't cause
long-term maintenance toll. This one does not look like such problem.
Sorry.