Re: general protection fault in wb_workfn (2)

From: Dmitry Vyukov
Date: Thu Jun 07 2018 - 14:47:03 EST

On Tue, Jun 5, 2018 at 3:45 PM, Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
> Dmitry, can you assign VM resources for a git tree for this bug? This bug wants to fight
> against ...

Hi Tetsuo,

Most of the reasons for not doing it still stand. A syzkaller instance
will produce not just this bug, it will produce hundreds of different
bugs. Then the question is: what to do with these bugs? Report all to
mailing lists?
I think the solution here is just to run syzkaller instance locally.
It's just a program anybody can run it on any kernel with any custom
patches. Moreover for local instance it's also possible to limit set
of tested syscalls to increase probability of hitting this bug and at
the same time filter out most of other bugs.

Do we have any idea about the guilty subsystem? You mentioned
bdi_unregister, why? What would be the set of syscalls to concentrate
I will do a custom run when I get around to it, if nobody else beats me to it.

> On 2018/06/01 1:56, Jens Axboe wrote:
>> On 5/31/18 7:42 AM, Jan Kara wrote:
>>> On Thu 31-05-18 22:19:44, Tetsuo Handa wrote:
>>>> On 2018/05/31 20:42, Jan Kara wrote:
>>>>> On Thu 31-05-18 01:00:08, Tetsuo Handa wrote:
>>>>>> So, we have no idea what is happening...
>>>>>> Then, what about starting from temporary debug printk() patch shown below?
>>>>>> >From 4f70f72ad3c9ae6ce1678024ef740aca4958e5b0 Mon Sep 17 00:00:00 2001
>>>>>> From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
>>>>>> Date: Wed, 30 May 2018 09:57:10 +0900
>>>>>> Subject: [PATCH] bdi: Add temporary config for debugging wb_workfn() versus
>>>>>> bdi_unregister() race bug.
>>>>>> syzbot is hitting NULL pointer dereference at wb_workfn() [1]. But due to
>>>>>> limitations that syzbot cannot find reproducer for this bug (frequency is
>>>>>> once or twice per a day) nor we can't capture vmcore in the environment
>>>>>> which syzbot is using, for now we need to rely on printk() debugging.
>>>>>> [1]
>>>>>> Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
>>>>> Hum a bit ugly solution but if others are fine with this, I can live with
>>>>> it for a while as well. Or would it be possible for syzkaller to just test
>>>>> some git tree where this patch is included? Then we would not even have to
>>>>> have the extra config option...
>>>> If syzbot can reproduce this bug that way. While it is possible to add/remove
>>>> git trees syzbot tests, frequently adding/removing trees is bothering.
>>>> syzbot can enable extra config option. Maybe the config name should be
>>>> something like CONFIG_DEBUG_FOR_SYZBOT rather than individual topic.
>>>> I think that syzbot is using many VM instances. I don't know how many
>>>> instances will be needed for reproducing this bug within reasonable period.
>>>> More git trees syzbot tests, (I assume that) longer period will be needed
>>>> for reproducing this bug. The most reliable way is to use the shared part
>>>> of all trees (i.e. linux.git).
>>> I understand this, I'd be just a bit reluctant to merge temporary debug
>>> patches like this to Linus' tree only to revert them later just because
>>> syzkaller... What do others think?
>> I guess I don't understand why having it in Linus's tree would make
>> a difference to syzkaller?
>> If there is a compelling reason why that absolutely has to be done,
>> I don't think it should be accompanied by a special Kconfig option
>> for it. It should just be on unconditionally, with the intent to
>> remove it before release.