Re: [PATCH] tty: fix flush_to_ldisc() oops before tty_open is done
From: Greg KH
Date: Wed Oct 25 2017 - 03:00:41 EST
On Wed, Oct 25, 2017 at 10:15:35AM +0800, taoyuhong wrote:
> From: Yuhong Tao <taoyuhong@xxxxxxxxxx>
>
> When tty_open() is opening a serial tty at the first time, after
> alloc_tty_struct() is called, before tty->ops->open() is finished.
> Serial driever like pl011 on ARM is ready to setup kworker threads
> to receive data with flush_to_ldisc(). Serial input at this time
> window can trigger kernel oops.
> On the other side, flush_to_ldisc() can also oops on a hung-up tty.
> It is unknown why flush_to_ldisc() can happen after target tty is hung
> up. But both of these 2 situations come into a same result, look like
> that:
>
> [ 11.287911] [<ffff0000084ba198>] n_tty_receive_buf_common+0x58/0xa58
> [ 11.290161] [<ffff0000084baba8>] n_tty_receive_buf2+0x10/0x18
> [ 11.292162] [<ffff0000084bd0a0>] tty_ldisc_receive_buf+0x20/0x68
> [ 11.294318] [<ffff0000084bd68c>] flush_to_ldisc+0xd4/0xe8
> [ 11.296181] [<ffff0000080d6958>] process_one_work+0x128/0x2f0
> [ 11.298166] [<ffff0000080d6b74>] worker_thread+0x54/0x440
> [ 11.300006] [<ffff0000080dccac>] kthread+0xe4/0xf8
> [ 11.301642] [<ffff000008082e80>] ret_from_fork+0x10/0x50
> [ 11.303460] Code: b9009fbf f90047a0 d2844c00 8b170000 (c8dffc03)
>
> Calltrace may have a bit different behind n_tty_receive_buf_common, that
> is about accessing to uninitialized or realeased data struct.
> Serial driver may has problem, but tty driver can easily handle these 2
> oops problems by:
>
> 1. Skip data transfer of hung-up tty, in flush_to_ldisc()
> 2. Mark hungup to tty_struct created by tty_openi(), which will be cleaned
> at the end of tty_open().
>
> This is tested with linux-4.13.9/Debian on ARM virtual machine, whose
> serial chip is pl011. You have a little chance to watch this happen, when
> keep input from keyboard during system start or shutdown. And it happens
> 100% if a msleep() is inserted before tty->ops->open() is called in
> tty_open(), after tty_struct is created by tty_init_dev().
>
> Signed-off-by: Yuhong Tao <taoyuhong@xxxxxxxxxx>
> ---
> drivers/tty/pty.c | 1 +
> drivers/tty/tty_io.c | 3 +++
> drivers/tty/tty_port.c | 5 +++++
> 3 files changed, 9 insertions(+)
>
> diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
> index a6d5164..ad5b075 100644
> --- a/drivers/tty/pty.c
> +++ b/drivers/tty/pty.c
> @@ -853,6 +853,7 @@ static int ptmx_open(struct inode *inode, struct file *filp)
>
> tty_debug_hangup(tty, "opening (count=%d)\n", tty->count);
>
> + clear_bit(TTY_HUPPED, &tty->flags);
This feels "odd", are we sure that open really should be clearning this
flag?
> tty_unlock(tty);
> return 0;
> err_release:
> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> index 10c4038..f3abad0 100644
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -1313,6 +1313,9 @@ struct tty_struct *tty_init_dev(struct tty_driver *driver, int idx)
>
> tty->port->itty = tty;
>
> + set_bit(TTY_HUPPED, &tty->flags);
> + barrier();
What is this barrier() call protecting? Please always comment it so
that we know what is going on in 5 years when we next look at this code
:)
> +
> /*
> * Structures all installed ... call the ldisc open routines.
> * If we fail here just call release_tty to clean up. No need
> diff --git a/drivers/tty/tty_port.c b/drivers/tty/tty_port.c
> index 6b13719..e43482f 100644
> --- a/drivers/tty/tty_port.c
> +++ b/drivers/tty/tty_port.c
> @@ -34,6 +34,11 @@ static int tty_port_default_receive_buf(struct tty_port *port,
> if (!disc)
> return 0;
>
> + if (test_bit(TTY_HUPPED, &tty->flags)) {
> + tty_ldisc_deref(disc);
> + return 0;
> + }
> +
What keeps the bit from being set now right after checking it?
thanks,
greg k-h