Re: INFO: task hung in fsnotify_connector_destroy_workfn (2)

From: Tetsuo Handa
Date: Mon Sep 24 2018 - 10:04:08 EST


On 2018/09/24 22:31, Dmitry Vyukov wrote:
> On Mon, Sep 24, 2018 at 3:19 PM, Jiri Kosina <jikos@xxxxxxxxxx> wrote:
>> On Sun, 16 Sep 2018, Amir Goldstein wrote:
>>
>>>>> syzbot found the following crash on:
>>>>>
>>>>> HEAD commit: 11da3a7f84f1 Linux 4.19-rc3
>>>>> git tree: upstream
>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=141ffbca400000
>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=9917ff4b798e1a1e
>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=6fb572170402d311dd39
>>>>> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=136a60ae400000
>>>>>
>>>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>>>> Reported-by: syzbot+6fb572170402d311dd39@xxxxxxxxxxxxxxxxxxxxxxxxx
>>>>>
>>>>
>>>> Since it kept printk()ing for at least 93 seconds, it might have kept for 150 seconds.
>>>>
>>>> [ 174.614333] hid-generic 0000:0000:0000.0003: unknown main item tag 0x0
>>>> [ 268.196102] INFO: task kworker/u4:1:23 blocked for more than 140 seconds.
>>>>
>>>> Since the reproducer is simple, this might be lockup due to continuous printk().
>>>> If syzbot can reliably reproduce this problem using the reproducer, try
>>>> ratelimiting printk().
>>>
>>> Right.. and I was able to reproduce after setting CONFIG_HID_GENERIC=y
>>> but unless I am missing something, the core problem doesn't seem related to
>>> fsnotify and $SUBJECT, so CCing HID maintainer.
>>
>> Alright, so you're basically triggering a never-ending flow of kmsgs being
>> printed out from HID parser due to doing crazy things with the parser, and
>> that causes the issues for the kworker trying to flush them out.
>>
>> I guess the patch below fixes it, however the kworker should not really be
>> blocked by this I think ... adding a few more printk folks to double-check
>> why the kworker would get stuck due to massive printk() flood.
>
>
> The task hanged in synchronize_rcu(). I think any infinite loop
> (whether it prints or not) can cause this.

This problem is occurring on CONFIG_PREEMPT=y CONFIG_PREEMPT_RCU=y kernel.
I think that any insane printk() stress can delay waiters long enough to
fire khungtaskd watchdog. Though syzbot might be unable to reliably test
this problem (because syzbot has succeeded to trigger this problem for only
two times), why not to let syzbot try the patch anyway?