Re: kernel panic: corrupted stack end in wb_workfn
From: Dmitry Vyukov
Date: Wed Mar 20 2019 - 06:38:35 EST
On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> >> From bisection log:
> >>
> >> testing release v4.17
> >> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> >> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #1: crashed: kernel panic: corrupted stack end in worker_thread
> >> run #2: crashed: kernel panic: Out of memory and no killable processes...
> >> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #8: crashed: kernel panic: Out of memory and no killable processes...
> >> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> >> testing release v4.16
> >> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> >> run #0: OK
> >> run #1: OK
> >> run #2: OK
> >> run #3: OK
> >> run #4: OK
> >> run #5: crashed: kernel panic: Out of memory and no killable processes...
> >> run #6: OK
> >> run #7: crashed: kernel panic: Out of memory and no killable processes...
> >> run #8: OK
> >> run #9: OK
> >> testing release v4.15
> >> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> >> all runs: OK
> >> # git bisect start v4.16 v4.15
> >>
> >> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> >
> > Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > looks like the right range, no?
>
> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> "Stack corruption" can't manifest as "Out of memory and no killable processes".
>
> "kernel panic: Out of memory and no killable processes..." is completely
> unrelated to "kernel panic: corrupted stack end in wb_workfn".
Do you think this predicate is possible to code? Looking at the
examples we have, distinguishing different bugs does not look feasible
to me. If the predicate is not accurate, you just trade one set of
false positives to another set of false positives and then you at the
beginning of an infinite slippery slope refining it.
Also, if we see a different bug (assuming we can distinguish them),
does it mean that the original bug is not present? Or it's also
present, but we just hit the other one first? This also does not look
feasible to answer. And if you give a wrong answer, bisection goes the
wrong way and we are where we started. Just with more complex code and
things being even harder to explain to other people.
I mean, yes, I agree, kernel bug bisection won't be perfect. But do
you see anything actionable here?