Re: [syzbot] unexpected kernel reboot (4)

From: Dmitry Vyukov
Date: Thu Apr 22 2021 - 13:00:40 EST


On Thu, Apr 22, 2021 at 6:13 PM Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On 2021/04/22 23:20, Dmitry Vyukov wrote:
> > I've prepared this syzkaller change:
> > https://github.com/google/syzkaller/pull/2550/files
>
> OK. Please merge and let's see whether syzkaller can find different ways.

Merge. Thanks for digging into this.

> In my environment, this problem behaves very puzzling. While the reproducer
> I use is single threaded, changing timing via CONFIG_DEBUG_KOBJECT=y or
> even https://syzkaller.appspot.com/x/patch.diff?x=13d69ffed00000 avoids
> this problem. I can't narrow down what is happening.

This:
- kill_cad_pid(SIGINT, 1);
suggests the change can help... I think... this is good.


> > Re hibernation/suspend configs, you said disabling them is not
> > helping, right? Does it still make sense to disable them?
> > If these configs are enabled, we can at least find some bugs in the
> > preparation for suspend code. However, as you noted, it will
> > immediately lead to "lost connection".
> > Ideally we somehow tweak hibernation/suspend to get to the
> > hibernation/suspend point and then immediately and automatically
> > resume.
>
> That will be one of disable-specific-functionality changes.
>
> > This way we could test both suspend and unsuspend code, which
> > I assume can lead to bugs, and don't cause "lost connection" at the
> > same time. I guess such a mode does not exist today... and I am not
> > sure what happens with TCP connections after this.
>
> I don't know whether ssh sessions can survive 10 seconds of
> hibernation/suspend. But maybe disabling hibernation/suspend configs
> until disable-specific-functionality changes are accepted makes sense.

We would need to disable CONFIG_SUSPEND and CONFIG_HIBERNATION. I am
thinking if we will gain more than we lose... We will lose coverage of
these subsystems, but this will eliminate some of "lost connection"
crashes. Do you have any understanding as to how many "lost
connection"s this can prevent?