Re: possible deadlock in snd_seq_deliver_event

From: Takashi Iwai
Date: Tue Oct 31 2017 - 06:03:54 EST


On Tue, 31 Oct 2017 09:39:41 +0100,
Dmitry Vyukov wrote:
>
> On Tue, Oct 31, 2017 at 11:10 AM, Takashi Iwai <tiwai@xxxxxxx> wrote:
> > On Sun, 29 Oct 2017 10:57:58 +0100,
> > Takashi Iwai wrote:
> >>
> >> On Fri, 27 Oct 2017 10:11:18 +0200,
> >> Dmitry Vyukov wrote:
> >> >
> >> > On Fri, Oct 27, 2017 at 10:09 AM, syzbot
> >> > <bot+7feb8de6b4d6bf810cf098bef942cc387e79d0ad@xxxxxxxxxxxxxxxxxxxxxxxxx>
> >> > wrote:
> >> > > Hello,
> >> > >
> >> > > syzkaller hit the following crash on
> >> > > 2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e
> >> > > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
> >> > > compiler: gcc (GCC) 7.1.1 20170620
> >> > > .config is attached
> >> > > Raw console output is attached.
> >> > > C reproducer is attached
> >> > > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> >> > > for information about syzkaller reproducers
> >> > >
> >> > >
> >> > > ============================================
> >> > > WARNING: possible recursive locking detected
> >> > > 4.14.0-rc1+ #88 Not tainted
> >> > > --------------------------------------------
> >> > > syzkaller883997/2981 is trying to acquire lock:
> >> > > (&grp->list_mutex){++++}, at: [<ffffffff83d4dd49>] deliver_to_subscribers
> >> > > sound/core/seq/seq_clientmgr.c:666 [inline]
> >> > > (&grp->list_mutex){++++}, at: [<ffffffff83d4dd49>]
> >> > > snd_seq_deliver_event+0x279/0x790 sound/core/seq/seq_clientmgr.c:807
> >> > >
> >> > > but task is already holding lock:
> >> > > (&grp->list_mutex){++++}, at: [<ffffffff83d4dd49>] deliver_to_subscribers
> >> > > sound/core/seq/seq_clientmgr.c:666 [inline]
> >> > > (&grp->list_mutex){++++}, at: [<ffffffff83d4dd49>]
> >> > > snd_seq_deliver_event+0x279/0x790 sound/core/seq/seq_clientmgr.c:807
> >> > >
> >> > > other info that might help us debug this:
> >> > > Possible unsafe locking scenario:
> >> > >
> >> > > CPU0
> >> > > ----
> >> > > lock(&grp->list_mutex);
> >> > > lock(&grp->list_mutex);
> >> > >
> >> > > *** DEADLOCK ***
> >> > >
> >> > > May be due to missing lock nesting notation
> >> > >
> >> > > 2 locks held by syzkaller883997/2981:
> >> > > #0: (register_mutex#4){+.+.}, at: [<ffffffff83d60ada>]
> >> > > odev_release+0x4a/0x70 sound/core/seq/oss/seq_oss.c:152
> >> > > #1: (&grp->list_mutex){++++}, at: [<ffffffff83d4dd49>]
> >> > > deliver_to_subscribers sound/core/seq/seq_clientmgr.c:666 [inline]
> >> > > #1: (&grp->list_mutex){++++}, at: [<ffffffff83d4dd49>]
> >> > > snd_seq_deliver_event+0x279/0x790 sound/core/seq/seq_clientmgr.c:807
> >> > >
> >> > > stack backtrace:
> >> > > CPU: 1 PID: 2981 Comm: syzkaller883997 Not tainted 4.14.0-rc1+ #88
> >> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >> > > Google 01/01/2011
> >> > > Call Trace:
> >> > > __dump_stack lib/dump_stack.c:16 [inline]
> >> > > dump_stack+0x194/0x257 lib/dump_stack.c:52
> >> > > print_deadlock_bug kernel/locking/lockdep.c:1797 [inline]
> >> > > check_deadlock kernel/locking/lockdep.c:1844 [inline]
> >> > > validate_chain kernel/locking/lockdep.c:2453 [inline]
> >> > > __lock_acquire+0x1232/0x4620 kernel/locking/lockdep.c:3498
> >> > > lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002
> >> > > down_read+0x96/0x150 kernel/locking/rwsem.c:23
> >> > > deliver_to_subscribers sound/core/seq/seq_clientmgr.c:666 [inline]
> >> > > snd_seq_deliver_event+0x279/0x790 sound/core/seq/seq_clientmgr.c:807
> >> > > snd_seq_kernel_client_dispatch+0x11e/0x150
> >> > > sound/core/seq/seq_clientmgr.c:2309
> >> > > dummy_input+0x2c4/0x400 sound/core/seq/seq_dummy.c:104
> >> > > snd_seq_deliver_single_event.constprop.11+0x2fb/0x940
> >> > > sound/core/seq/seq_clientmgr.c:621
> >> > > deliver_to_subscribers sound/core/seq/seq_clientmgr.c:676 [inline]
> >> > > snd_seq_deliver_event+0x318/0x790 sound/core/seq/seq_clientmgr.c:807
> >> > > snd_seq_kernel_client_dispatch+0x11e/0x150
> >> > > sound/core/seq/seq_clientmgr.c:2309
> >> > > dummy_input+0x2c4/0x400 sound/core/seq/seq_dummy.c:104
> >> > > snd_seq_deliver_single_event.constprop.11+0x2fb/0x940
> >> > > sound/core/seq/seq_clientmgr.c:621
> >> > > snd_seq_deliver_event+0x12c/0x790 sound/core/seq/seq_clientmgr.c:818
> >> > > snd_seq_kernel_client_dispatch+0x11e/0x150
> >> > > sound/core/seq/seq_clientmgr.c:2309
> >> > > snd_seq_oss_dispatch sound/core/seq/oss/seq_oss_device.h:150 [inline]
> >> > > snd_seq_oss_midi_reset+0x44b/0x700 sound/core/seq/oss/seq_oss_midi.c:481
> >> > > snd_seq_oss_synth_reset+0x398/0x980 sound/core/seq/oss/seq_oss_synth.c:416
> >> > > snd_seq_oss_reset+0x6c/0x260 sound/core/seq/oss/seq_oss_init.c:448
> >> > > snd_seq_oss_release+0x71/0x120 sound/core/seq/oss/seq_oss_init.c:425
> >> > > odev_release+0x52/0x70 sound/core/seq/oss/seq_oss.c:153
> >> > > __fput+0x333/0x7f0 fs/file_table.c:210
> >> > > ____fput+0x15/0x20 fs/file_table.c:244
> >> > > task_work_run+0x199/0x270 kernel/task_work.c:112
> >> > > exit_task_work include/linux/task_work.h:21 [inline]
> >> > > do_exit+0xa52/0x1b40 kernel/exit.c:865
> >> > > do_group_exit+0x149/0x400 kernel/exit.c:968
> >> > > SYSC_exit_group kernel/exit.c:979 [inline]
> >> > > SyS_exit_group+0x1d/0x20 kernel/exit.c:977
> >> > > entry_SYSCALL_64_fastpath+0x1f/0xbe
> >> > > RIP: 0033:0x442c58
> >> > > RSP: 002b:00007ffd15d4f8d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> >> > > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000442c58
> >> > > RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
> >> > > RBP: 0000000000000082 R08: 00000000000000e7 R09: ffffffffffffffd0
> >> > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401ca0
> >> > > R13: 0000000000401d30 R14
> >> >
> >> > I've just re-reproduced this on upstream
> >> > 15f859ae5c43c7f0a064ed92d33f7a5bc5de6de0 (Oct 26):
> >> >
> >> > ============================================
> >> > WARNING: possible recursive locking detected
> >> > 4.14.0-rc6+ #10 Not tainted
> >> > --------------------------------------------
> >> > a.out/3062 is trying to acquire lock:
> >> > (&grp->list_mutex){++++}, at: [<ffffffff83d28879>]
> >> > deliver_to_subscribers sound/core/seq/seq_clientmgr.c:666 [inline]
> >> > (&grp->list_mutex){++++}, at: [<ffffffff83d28879>]
> >> > snd_seq_deliver_event+0x279/0x790 sound/core/seq/seq_clientmgr.c:807
> >> >
> >> > but task is already holding lock:
> >> > (&grp->list_mutex){++++}, at: [<ffffffff83d28879>]
> >> > deliver_to_subscribers sound/core/seq/seq_clientmgr.c:666 [inline]
> >> > (&grp->list_mutex){++++}, at: [<ffffffff83d28879>]
> >> > snd_seq_deliver_event+0x279/0x790 sound/core/seq/seq_clientmgr.c:807
> >> >
> >> > other info that might help us debug this:
> >> > Possible unsafe locking scenario:
> >> >
> >> > CPU0
> >> > ----
> >> > lock(&grp->list_mutex);
> >> > lock(&grp->list_mutex);
> >> >
> >> > *** DEADLOCK ***
> >> >
> >> > May be due to missing lock nesting notation
> >>
> >> Indeed, this looks more like a simply missing nesting annotation.
> >> A totally untested patch is below.
> >
> > FWIW, the official patch with a proper description is below.
>
>
> syzbot failed to extract a reproducer, so we are limited in testing
> capabilities. But if you see the problem in the code, let's proceed
> with the patch.

OK, the fix doesn't seem to cause a regression, so I'm going to queue
it.

> Can you also please follow the following part? It will greatly help to
> keep the process running and make the bot able conclude when it has
> the fix in all branches and report other similarly looking issues in
> future. Thanks.
>
> > syzbot will keep track of this bug report.
> > Once a fix for this bug is committed, please reply to this email with:
> > #syz fix: exact-commit-title
> > Note: all commands must start from beginning of the line.

Shall I wait until it lands to Linus tree? Or the criteria suffice
with subsystem tree?

Also, I pasted the bot from address as is as the reporter, but it
looks a bit messy. How is this supposed to be?


thanks,

Takashi

>
> > thanks,
> >
> > Takashi
> >
> > -- 8< --
> > From: Takashi Iwai <tiwai@xxxxxxx>
> > Subject: [PATCH] ALSA: seq: Fix nested rwsem annotation for lockdep splat
> >
> > syzkaller reported the lockdep splat due to the possible deadlock of
> > grp->list_mutex of each sequencer client object. Actually this is
> > rather a false-positive report due to the missing nested lock
> > annotations. The sequencer client may deliver the event directly to
> > another client which takes own other lock.
> >
> > For addressing this issue, this patch replaces the simple down_read()
> > with down_read_nested(). As a lock subclass, the already existing
> > "hop" can be re-used, which indicates the depth of the call.
> >
> > Reference: http://lkml.kernel.org/r/089e082686ac9b482e055c832617@xxxxxxxxxx
> > Reported-by: syzbot <bot+7feb8de6b4d6bf810cf098bef942cc387e79d0ad@xxxxxxxxxxxxxxxxxxxxxxxxx>
> > Reported-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> > Cc: <stable@xxxxxxxxxxxxxxx>
> > Signed-off-by: Takashi Iwai <tiwai@xxxxxxx>
> > ---
> > sound/core/seq/seq_clientmgr.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/sound/core/seq/seq_clientmgr.c b/sound/core/seq/seq_clientmgr.c
> > index 6c9cba2166d9..d10c780dfd54 100644
> > --- a/sound/core/seq/seq_clientmgr.c
> > +++ b/sound/core/seq/seq_clientmgr.c
> > @@ -663,7 +663,7 @@ static int deliver_to_subscribers(struct snd_seq_client *client,
> > if (atomic)
> > read_lock(&grp->list_lock);
> > else
> > - down_read(&grp->list_mutex);
> > + down_read_nested(&grp->list_mutex, hop);
> > list_for_each_entry(subs, &grp->list_head, src_list) {
> > /* both ports ready? */
> > if (atomic_read(&subs->ref_count) != 2)
> > --
> > 2.14.2
> >
>