Re: BUG: workqueue lockup (2)
From: Eric Biggers
Date: Sat May 12 2018 - 23:30:26 EST
Hi Tetsuo,
On Sun, May 13, 2018 at 11:06:17AM +0900, Tetsuo Handa wrote:
> Eric Biggers wrote:
> > The bug that this reproducer reproduces was fixed a while ago by commit
> > 966031f340185e, so I'm marking this bug report fixed by it:
> >
> > #syz fix: n_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD)
>
> Nope. Commit 966031f340185edd ("n_tty: fix EXTPROC vs ICANON interaction with
> TIOCINQ (aka FIONREAD)") is "Wed Dec 20 17:57:06 2017 -0800" but the last
> occurrence on linux.git (commit 008464a9360e31b1 ("Merge branch 'for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid")) is only a few days ago
> ("Wed May 9 10:49:52 2018 -1000").
>
> >
> > Note that the error message was not always "BUG: workqueue lockup"; it was also
> > sometimes like "watchdog: BUG: soft lockup - CPU#5 stuck for 22s!".
> >
> > syzbot still is hitting the "BUG: workqueue lockup" error sometimes, but it must
> > be for other reasons. None has a reproducer currently.
>
> The last occurrence on linux.git is considered as a duplicate of
>
> [upstream] INFO: rcu detected stall in n_tty_receive_char_special
> https://syzkaller.appspot.com/bug?id=3d7481a346958d9469bebbeb0537d5f056bdd6e8
>
> which we already have a reproducer at
> https://groups.google.com/d/msg/syzkaller-bugs/O4DbPiJZFBY/YCVPocx3AgAJ
> and debug output is available at
> https://groups.google.com/d/msg/syzkaller-bugs/O4DbPiJZFBY/TxQ7WS5ZAwAJ .
>
> We are currently waiting for comments from Peter Hurley who added that code.
>
Actually I did verify that the C reproducer is fixed by the commit I said, and I
also simplified the reproducer and turned it into an LTP test
(http://lists.linux.it/pipermail/ltp/2018-May/008071.html). Like I said, syzbot
is still occasionally hitting the same "BUG: workqueue lockup" error, but
apparently for other reasons. The one on 008464a9360e31b even looks like it's
in the TTY layer too, and it very well could be a very similar bug, but based on
what I observed it's not the same bug that syzbot reproduced on f3b5ad89de16f5d.
Generally it's best to close syzbot bug reports once the original cause is
fixed, so that syzbot can continue to report other bugs with the same signature.
Otherwise they sit on the syzbot dashboard where few people are looking at them.
Though of course, if you are up to it, you're certainly free to look into any of
the crashes already there even before a new bug report gets created.
Note also that a "workqueue lockup" can be caused by almost anything in the
kernel, I think. This one for example is probably in the sound subsystem:
https://syzkaller.appspot.com/text?tag=CrashReport&x=1767232b800000
Thanks!
Eric