Re: tty: deadlock between n_tracerouter_receivebuf and flush_to_ldisc

From: One Thousand Gnomes
Date: Fri Jan 15 2016 - 11:33:29 EST


On Fri, 15 Jan 2016 08:51:39 +0100
Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:

> On Wed, Dec 30, 2015 at 11:44 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> > Hello,
> >
> > I've got the following lock order inversion report on
> > 8513342170278468bac126640a5d2d12ffbff106 (Dec 28). Unfortunately I
> > failed to find a reproducer for it.
>
> Ping. This is the most frequent failure during my testing now. J
> Freyensee, you seem to wrote drivers/tty/n_tracerouter.c. Please take
> a look. Thanks.

This driver hasn't changed since forever, nor does it look like a bug in
n_tracerouter.

This first two threads of execution is for the old line discipline so
don't occur at the same time as executing tracerouter.

This is opening the port with the tracerouter discipline as you change
line discipline. The core code locks the tty and then takes the ldisc
lock. Closes the old ldisc, opens the new, opening the new takes the
router lock.

(tty_lock, tty_ldisc_lock, termios_sem, route_lock)


> > -> #1 (&port->buf.lock/1){+.+...}:
> > [<ffffffff813f0acf>] lock_acquire+0x19f/0x3c0
> > kernel/locking/lockdep.c:3585
> > [< inline >] __mutex_lock_common kernel/locking/mutex.c:518
> > [<ffffffff85c86ba6>] mutex_lock_nested+0xb6/0x940
> > kernel/locking/mutex.c:618
> > [<ffffffff82bc77ff>] n_tracerouter_open+0x1f/0x210
> > drivers/tty/n_tracerouter.c:75
> > [<ffffffff82bacd78>] tty_ldisc_open.isra.2+0x78/0xd0
> > drivers/tty/tty_ldisc.c:447
> > [<ffffffff82bad43a>] tty_set_ldisc+0x1ca/0xa30
> > drivers/tty/tty_ldisc.c:567
> > [< inline >] tiocsetd drivers/tty/tty_io.c:2650
> > [<ffffffff82b9483a>] tty_ioctl+0xb2a/0x2160 drivers/tty/tty_io.c:2883
> > [< inline >] vfs_ioctl fs/ioctl.c:43
> > [<ffffffff81750d41>] do_vfs_ioctl+0x681/0xe40 fs/ioctl.c:607
> > [< inline >] SYSC_ioctl fs/ioctl.c:622
> > [<ffffffff8175158f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:613
> > [<ffffffff85c8eb36>] entry_SYSCALL_64_fastpath+0x16/0x7a
> > arch/x86/entry/entry_64.S:185

This is data being received by the new line discipline.

(buf->lock, routelock)

> >
> > -> #0 (routelock){+.+...}:
> > [< inline >] check_prev_add kernel/locking/lockdep.c:1853
> > [< inline >] check_prevs_add kernel/locking/lockdep.c:1958
> > [< inline >] validate_chain kernel/locking/lockdep.c:2144
> > [<ffffffff813ee02e>] __lock_acquire+0x345e/0x3ca0
> > kernel/locking/lockdep.c:3206
> > [<ffffffff813f0acf>] lock_acquire+0x19f/0x3c0
> > kernel/locking/lockdep.c:3585
> > [< inline >] __mutex_lock_common kernel/locking/mutex.c:518
> > [<ffffffff85c86ba6>] mutex_lock_nested+0xb6/0x940
> > kernel/locking/mutex.c:618
> > [<ffffffff82bc77c0>] n_tracerouter_receivebuf+0x20/0x40
> > drivers/tty/n_tracerouter.c:175
> > [< inline >] receive_buf drivers/tty/tty_buffer.c:454
> > [<ffffffff82bafc34>] flush_to_ldisc+0x584/0x7f0
> > drivers/tty/tty_buffer.c:517
> > [<ffffffff8133833c>] process_one_work+0x76c/0x13e0
> > kernel/workqueue.c:2030
> > [<ffffffff81339093>] worker_thread+0xe3/0xe90 kernel/workqueue.c:2162
> > [<ffffffff8134b63f>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
> > [<ffffffff85c8eeef>] ret_from_fork+0x3f/0x70
> > arch/x86/entry/entry_64.S:468
> >

The lock inversion the log shows seems to be a problem in n_tty not in
the tracerouter code.

However that seems to be

tty_set_ldisc to N_TTY
takes tty lock
takes tty_ldisc lock
down termios_rwsem (R)
tty->ops->set_ldisc
down termios rwsem (W)

verus

receive_driver_bufs
buf->lock
termios rwsem (R)


which also looks fine

So I think it's just a false report caused by other changes in the tty
layer, and nothing to do with the tracerouter code at all.

Alan