Re: general protection fault in wb_workfn (2)

From: Dmitry Vyukov
Date: Fri Jun 08 2018 - 10:45:59 EST

On Fri, Jun 8, 2018 at 4:31 AM, Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
> Dmitry Vyukov wrote:
>> On Tue, Jun 5, 2018 at 3:45 PM, Tetsuo Handa
>> <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>> > Dmitry, can you assign VM resources for a git tree for this bug? This bug wants to fight
>> > against ...
>> Hi Tetsuo,
>> Most of the reasons for not doing it still stand. A syzkaller instance
>> will produce not just this bug, it will produce hundreds of different
>> bugs. Then the question is: what to do with these bugs? Report all to
>> mailing lists?
> Is it possible to add linux-next.git tree as a target for fuzzing? If yes,
> we can try debug patches easily, in addition to find bugs earlier than now.

syzbot tested linux-next and mmotm initially, but they were removed at
the request of kernel developers. See:
Indeed, linux-next produces around 50 assorted one-off unexplainable
bug reports.

>> I think the solution here is just to run syzkaller instance locally.
>> It's just a program anybody can run it on any kernel with any custom
>> patches. Moreover for local instance it's also possible to limit set
>> of tested syscalls to increase probability of hitting this bug and at
>> the same time filter out most of other bugs.
> If this bug is reproducible with VM resources individual developer can afford...
> Since my Linux development environment is VMware guests on a Windows PC, I can't
> run VM instance which needs KVM acceleration. Also, due to security policy, I can't
> utilize external VM resources available on the Internet, as well as I can't use ssh
> and git protocols. Speak of this bug, even with a lot of VM instances, syzbot can
> reproduce this bug only once or twice per a day. Thus, the question for me boils
> down to, whether I can reproduce this bug using one VMware guest instance with 4GB
> of memory. Effectively, I don't have access to environments for running syzkaller
> instance...

Well, I don't know what to say, it does require some resources.

>> Do we have any idea about the guilty subsystem? You mentioned
>> bdi_unregister, why? What would be the set of syscalls to concentrate
>> on?
>> I will do a custom run when I get around to it, if nobody else beats me to it.
> Because bdi_unregister() does "bdi->dev = NULL;" which wb_workfn() is hitting
> NULL pointer dereference.

Right, wb_workfn is not a generic function, it's fs-specific function.

Trying to reproduce this locally now.