Re: INFO: task hung in fsnotify_connector_destroy_workfn (2)

From: Petr Mladek
Date: Tue Sep 25 2018 - 04:55:10 EST


On Mon 2018-09-24 15:19:35, Jiri Kosina wrote:
> On Sun, 16 Sep 2018, Amir Goldstein wrote:
>
> > > > syzbot found the following crash on:
> > > >
> > > > HEAD commit: 11da3a7f84f1 Linux 4.19-rc3
> > > > git tree: upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=141ffbca400000
> > > > kernel config: https://syzkaller.appspot.com/x/.config?x=9917ff4b798e1a1e
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=6fb572170402d311dd39
> > > > compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=136a60ae400000
> > > >
> > > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > > Reported-by: syzbot+6fb572170402d311dd39@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > >
> > >
> > > Since it kept printk()ing for at least 93 seconds, it might have kept for 150 seconds.
> > >
> > > [ 174.614333] hid-generic 0000:0000:0000.0003: unknown main item tag 0x0
> > > [ 268.196102] INFO: task kworker/u4:1:23 blocked for more than 140 seconds.
> > >
> > > Since the reproducer is simple, this might be lockup due to continuous printk().
> > > If syzbot can reliably reproduce this problem using the reproducer, try
> > > ratelimiting printk().
> >
> > Right.. and I was able to reproduce after setting CONFIG_HID_GENERIC=y
> > but unless I am missing something, the core problem doesn't seem related to
> > fsnotify and $SUBJECT, so CCing HID maintainer.
>
> Alright, so you're basically triggering a never-ending flow of kmsgs being
> printed out from HID parser due to doing crazy things with the parser, and
> that causes the issues for the kworker trying to flush them out.

My understanding is that the flow of messages is long but it ends at
some point. For example, the many warnings for
"hid-generic 0000:0000:0000.0003" eventually ends with

[ 221.223267] hid-generic 0000:0000:0000.0003: hidraw0: <UNKNOWN> HID v0.00 Device [syz1] on syz1


Next, there are usually intermixed messages for two devices, e.g.

[ 174.776326] hid-generic 0000:0000:0000.0003: unknown main item tag 0x0
[ 174.782734] hid-generic 0000:0000:0000.0004: unknown main item tag 0x0

Therefore I believe that console_lock is being transfered and no
process is stuck in printk()/console_unlock().


> I guess the patch below fixes it, however the kworker should not really be
> blocked by this I think ... adding a few more printk folks to double-check
> why the kworker would get stuck due to massive printk() flood.

It seems that the many printk() calls just delay the entire
process and rcu gets blocked for too long. The messages are
the same all the time. So some rate limiting looks like
a reasonable solution.

Best Regards,
Petr