Re: WARNING in get_pi_state

From: Dmitry Vyukov
Date: Tue Oct 31 2017 - 06:21:26 EST


On Tue, Oct 31, 2017 at 1:08 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Tue, Oct 31, 2017 at 12:29:50PM +0300, Dmitry Vyukov wrote:
>> I understand your sentiment, but it's definitely not _at all_. The
>> system compiled this exact code, run it and triggered the bug on it.
>> Do you have suggestions on how to make this code more portable? How
>> does this setup would look on your system?
>
> So I don't see the point of that tun stuff; what was is supposed to do?
>
> All it ever did after creation was flush_tun(), which reads until empty.
> But given nobody would ever write into it, that's an 'expensive' NO-OP.

See the text below.
It does try to minimize both programs and features used (e.g. also
these clunky NONFAILING macros, and filesystem business). But if it
takes 100 seconds to reproduce, then it's hard to do minimization.
Consider that you are trying to bisect such bugs, that also will be
hard and unreliable, and you can get a wrong commit in the end.

See this for an example for much more tidy reproducer:
https://groups.google.com/forum/#!topic/syzkaller-bugs/9nYn7hpNpEk
But that's a single threaded bug that instantly triggers each time you
run the program.


>> We do try hard to get rid of unnecessary stuff in reproducers. I think
>> what happened in this case is the following. This is a hard to
>> reproduce race. The bot was able to reproduce the crash on initial
>> program that uses tun, then tried to get rid of tun code and
>> re-reproduce it, but it did not reproduce this time, so it concluded
>> that tun code is somehow necessary here. That's unfortunate
>> consequence of testing complex concurrent code. May become somewhat
>> better once we have KTSAN, the race detector.
>
> I ripped out the tun bits and it reproduced in ~100 seconds. I've now
> got it running for well over 30m on the fixed kernel while I'm trying to
> come up with a comprehensible Changelog ;-)