Re: [PATCH] tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c
From: Peter Hurley
Date: Wed Sep 30 2015 - 09:39:21 EST
On 09/28/2015 09:59 PM, Kosuke Tatsukawa wrote:
> My colleague ran into a program stall on a x86_64 server, where
> n_tty_read() was waiting for data even if there was data in the buffer
> in the pty. kernel stack for the stuck process looks like below.
> #0 [ffff88303d107b58] __schedule at ffffffff815c4b20
> #1 [ffff88303d107bd0] schedule at ffffffff815c513e
> #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818
> #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2
> #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23
> #5 [ffff88303d107dd0] tty_read at ffffffff81368013
> #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704
> #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57
> #8 [ffff88303d107f00] sys_read at ffffffff811a4306
> #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7
>
> There seems to be two problems causing this issue.
>
> First, in drivers/tty/n_tty.c, __receive_buf() stores the data and
> updates ldata->commit_head using smp_store_release() and then checks
> the wait queue using waitqueue_active(). However, since there is no
> memory barrier, __receive_buf() could return without calling
> wake_up_interactive_poll(), and at the same time, n_tty_read() could
> start to wait in wait_woken() as in the following chart.
>
> __receive_buf() n_tty_read()
> ------------------------------------------------------------------------
> if (waitqueue_active(&tty->read_wait))
> /* Memory operations issued after the
> RELEASE may be completed before the
> RELEASE operation has completed */
> add_wait_queue(&tty->read_wait, &wait);
> ...
> if (!input_available_p(tty, 0)) {
> smp_store_release(&ldata->commit_head,
> ldata->read_head);
> ...
> timeout = wait_woken(&wait,
> TASK_INTERRUPTIBLE, timeout);
Thank you for reporting this, Tatsukawa-san.
While I agree that a read memory barrier is required before checking
waitqueue_active() in __receive_buf() and n_tty_receive_char_special(),
on x86/64 that sequence would never have been reordered by the CPU, and
thus cannot be the cause of the observed stall.
Further, the compiler should not have been able to reorder the
load of tty->read_wait before the store of ldata->commit_head since it
has no visibility into kill_fasync().
So I'm concerned that some other cause is to blame.
To eliminate a compiler problem, would you please attach a combined listing
for drivers/tty/n_tty.c for the machine this occurred on?
What kernel version was the stall observed?
> ------------------------------------------------------------------------
>
> The second problem is that n_tty_read() also lacks a memory barrier
> call and could also cause __receive_buf() to return without calling
> wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken()
> as in the chart below.
>
> __receive_buf() n_tty_read()
> ------------------------------------------------------------------------
> spin_lock_irqsave(&q->lock, flags);
> /* from add_wait_queue() */
> ...
> if (!input_available_p(tty, 0)) {
> /* Memory operations issued after the
> RELEASE may be completed before the
> RELEASE operation has completed */
> smp_store_release(&ldata->commit_head,
> ldata->read_head);
> if (waitqueue_active(&tty->read_wait))
> __add_wait_queue(q, wait);
> spin_unlock_irqrestore(&q->lock,flags);
> /* from add_wait_queue() */
> ...
> timeout = wait_woken(&wait,
> TASK_INTERRUPTIBLE, timeout);
This problem was introduced by the wait_woken() mess; prior to that
there was a full memory barrier in n_tty_read() loop to set the task state
_before evaluating conditions_.
Though, again, the sequence above is not possible at all on x86/64 since
neither the CPU nor the compiler can produce the hypothetical code in
n_tty_read() above.
I need to think further on the consequences of the missing full memory
barrier in n_tty_read().
Other notes below.
> ------------------------------------------------------------------------
>
> There are also other places in drivers/tty/n_tty.c which have similar
> calls to waitqueue_active(), so instead of adding many memory barrier
> calls, this patch simply removes the call to waitqueue_active(),
> leaving just wake_up*() behind.
>
> This fixes both problems because, even though the memory access before
> or after the spinlocks in both wake_up*() and add_wait_queue() can
> sneak into the critical section, it cannot go past it and the critical
> section assures that they will be serialized (please see "INTER-CPU
> ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a
> better explanation). Moreover, the resulting code is much simpler.
>
> Latency measurement using a ping-pong test over a pty doesn't show any
> visible performance drop.
>
> Signed-off-by: Kosuke Tatsukawa <tatsu@xxxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> ---
> drivers/tty/n_tty.c | 24 ++++++++----------------
> 1 files changed, 8 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index 20932cc..41eb3ab 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -266,8 +266,7 @@ static void n_tty_check_unthrottle(struct tty_struct *tty)
> return;
> n_tty_kick_worker(tty);
> n_tty_write_wakeup(tty->link);
> - if (waitqueue_active(&tty->link->write_wait))
> - wake_up_interruptible_poll(&tty->link->write_wait, POLLOUT);
> + wake_up_interruptible_poll(&tty->link->write_wait, POLLOUT);
Please remove this blob. I/O performed via the write_wait queue is not lockless.
> return;
> }
>
> @@ -343,8 +342,7 @@ static void n_tty_packet_mode_flush(struct tty_struct *tty)
> spin_lock_irqsave(&tty->ctrl_lock, flags);
> tty->ctrl_status |= TIOCPKT_FLUSHREAD;
> spin_unlock_irqrestore(&tty->ctrl_lock, flags);
> - if (waitqueue_active(&tty->link->read_wait))
> - wake_up_interruptible(&tty->link->read_wait);
> + wake_up_interruptible(&tty->link->read_wait);
Ok.
> }
> }
>
> @@ -1180,8 +1178,7 @@ static void n_tty_receive_break(struct tty_struct *tty)
> put_tty_queue('\0', ldata);
> }
> put_tty_queue('\0', ldata);
> - if (waitqueue_active(&tty->read_wait))
> - wake_up_interruptible_poll(&tty->read_wait, POLLIN);
> + wake_up_interruptible_poll(&tty->read_wait, POLLIN);
Please remove this blob and the one below.
I had a patch to completely remove this wakeup and the one below because
they're superfluous. I thought I submitted it some time ago; I'll find out
what happened to that patch.
Obviously not a cause of the observed stall because neither BRK nor
parity error can be generated on a pty.
> }
>
> /**
> @@ -1238,8 +1235,7 @@ static void n_tty_receive_parity_error(struct tty_struct *tty, unsigned char c)
> put_tty_queue('\0', ldata);
> } else
> put_tty_queue(c, ldata);
> - if (waitqueue_active(&tty->read_wait))
> - wake_up_interruptible_poll(&tty->read_wait, POLLIN);
> + wake_up_interruptible_poll(&tty->read_wait, POLLIN);
> }
>
> static void
> @@ -1382,8 +1378,7 @@ handle_newline:
> put_tty_queue(c, ldata);
> smp_store_release(&ldata->canon_head, ldata->read_head);
> kill_fasync(&tty->fasync, SIGIO, POLL_IN);
> - if (waitqueue_active(&tty->read_wait))
> - wake_up_interruptible_poll(&tty->read_wait, POLLIN);
> + wake_up_interruptible_poll(&tty->read_wait, POLLIN);
> return 0;
> }
> }
> @@ -1667,8 +1662,7 @@ static void __receive_buf(struct tty_struct *tty, const unsigned char *cp,
>
> if ((read_cnt(ldata) >= ldata->minimum_to_wake) || L_EXTPROC(tty)) {
> kill_fasync(&tty->fasync, SIGIO, POLL_IN);
> - if (waitqueue_active(&tty->read_wait))
> - wake_up_interruptible_poll(&tty->read_wait, POLLIN);
> + wake_up_interruptible_poll(&tty->read_wait, POLLIN);
> }
> }
>
> @@ -1887,10 +1881,8 @@ static void n_tty_set_termios(struct tty_struct *tty, struct ktermios *old)
> }
>
> /* The termios change make the tty ready for I/O */
> - if (waitqueue_active(&tty->write_wait))
> - wake_up_interruptible(&tty->write_wait);
> - if (waitqueue_active(&tty->read_wait))
> - wake_up_interruptible(&tty->read_wait);
> + wake_up_interruptible(&tty->write_wait);
> + wake_up_interruptible(&tty->read_wait);
n_tty_set_termios() cannot be concurrent with __receive_buf() or n_tty_read()
since this caller holds termios_rwsem write-locked. That said, this code
is not performance-sensitive so this is a don't-care.
Regards,
Peter Hurley
> }
>
> /**
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/