Re: kernel panic: corrupted stack end in wb_workfn
From: Tetsuo Handa
Date: Thu Mar 21 2019 - 07:43:58 EST
On 2019/03/21 18:51, Dmitry Vyukov wrote:
>>> Lots of bugs (half?) manifest differently. On top of this, titles
>>> change as we go back in history. On top of this, if we see a different
>>> bug, it does not mean that the original bug is also not there.
>>> This will sure solve some subset of cases better then the current
>>> logic. But I feel that that subset is smaller then what the current
>>> logic solves.
>>
>> Counter-examples come up in basically every other bisection.
>> For example:
>>
>> bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
>> building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
>> testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
>> testing release v4.19
>> testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
>> testing release v4.18
>> testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
>> testing release v4.17
>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
>
>
> And to make things even more interesting, this later changes to "BUG:
> unable to handle kernel NULL pointer dereference in vb2_vmalloc_put":
>
> testing release v4.12
> testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
> all runs: crashed: general protection fault in refcount_sub_and_test
> testing release v4.11
> testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
> all runs: crashed: BUG: unable to handle kernel NULL pointer
> dereference in vb2_vmalloc_put
>
> And since the original bug is in vb2 subsystem
> (https://syzkaller.appspot.com/bug?id=17535f4bf5b322437f7c639b59161ce343fc55a9),
> it's actually not clear even for me, if we should treat it as the same
> bug or not. May be different manifestation of the same root cause, or
> a different bug around.
>
Well, maybe we should use reproducers for checking whether each not-yet-fixed
problem is reproducible with old kernels rather than finding specific commit
that is causing specific problem?
I think there are two patterns syzbot starts reporting.
(a) a commit which causes one or more problems is merged into a codebase where
syzbot was already testing because syzbot already knew what/how should
that codebase be tested.
(b) a commit which causes one or more problems was already there in a codebase
where syzbot did not know until now what/how should that codebase be tested.
(a) tends to require testing new kernels (i.e. bisection range is narrow) whereas
(b) tends to require testing old kernels (i.e. bisection range is wide).
Regarding case (b), it is difficult for developers to guess when the problem
started, and I think that (b) tends to confuse automatic bisection attempts.
Therefore, instead of trying to find specific commit for specific problem using
"git bisect" approach, try running all reproducers (gathered from all problems)
on each release (e.g. each git tag) and append reproduced crashes to the
Manager Time Kernel Commit Syzkaller Config Log Report Syz repro C repro Maintainers
table for each not-yet-fixed problem of dashboard interface. That is, if running a
repro1 from problem1 on some old kernel reproduced a crash for problem2, append the
crash to the problem2's table. Maybe we want to use a new table with only
Kernel Commit Syzkaller Config Log Report Syz repro C repro
entries because what we want to know is the oldest kernel release which helps
guessing when the problem started.