Re: çå: çå: [PATCH][v4] tty: fix race between flush_to_ldisc and tty_open

From: Greg KH
Date: Thu Jan 31 2019 - 01:52:25 EST


On Thu, Jan 31, 2019 at 02:15:35AM +0000, Li,Rongqing wrote:
>
>
> > -----éäåä-----
> > åää: Greg KH [mailto:gregkh@xxxxxxxxxxxxxxxxxxx]
> > åéæé: 2019å1æ30æ 21:17
> > æää: Li,Rongqing <lirongqing@xxxxxxxxx>
> > æé: jslaby@xxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; gkohli@xxxxxxxxxxxxxx;
> > linux-serial@xxxxxxxxxxxxxxx
> > äé: Re: çå: [PATCH][v4] tty: fix race between flush_to_ldisc and tty_open
> >
> > On Wed, Jan 30, 2019 at 12:48:42PM +0000, Li,Rongqing wrote:
> > >
> > >
> > > > -----éäåä-----
> > > > åää: linux-kernel-owner@xxxxxxxxxxxxxxx
> > > > [mailto:linux-kernel-owner@xxxxxxxxxxxxxxx] äè Greg KH
> > > > åéæé: 2019å1æ30æ 18:19
> > > > æää: Li,Rongqing <lirongqing@xxxxxxxxx>
> > > > æé: jslaby@xxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > > > gkohli@xxxxxxxxxxxxxx
> > > > äé: Re: [PATCH][v4] tty: fix race between flush_to_ldisc and
> > > > tty_open
> > > >
> > > > On Fri, Jan 18, 2019 at 05:27:17PM +0800, Li RongQing wrote:
> > > > > There still is a race window after the commit b027e2298bd588
> > > > > ("tty: fix data race between tty_init_dev and flush of buf"), and
> > > > > we encountered this crash issue if receive_buf call comes before
> > > > > tty initialization completes in n_tty_open and
> > > > > tty->driver_data may be NULL.
> > > > >
> > > > > CPU0 CPU1
> > > > > ---- ----
> > > > > n_tty_open
> > > > > tty_init_dev
> > > > > tty_ldisc_unlock
> > > > > schedule flush_to_ldisc
> > > > > receive_buf
> > > > > tty_port_default_receive_buf
> > > > > tty_ldisc_receive_buf
> > > > > n_tty_receive_buf_common
> > > > > __receive_buf
> > > > > uart_flush_chars
> > > > > uart_start
> > > > > /*tty->driver_data is NULL*/
> > > > > tty->ops->open
> > > > > /*init tty->driver_data*/
> > > > >
> > > > > it can be fixed by extending ldisc semaphore lock in tty_init_dev
> > > > > to driver_data initialized completely after tty->ops->open(), but
> > > > > this will lead to put lock on one function and unlock in some
> > > > > other function, and hard to maintain, so fix this race only by
> > > > > checking
> > > > > tty->driver_data when receiving, and return if tty->driver_data
> > > > > is NULL
> > > > >
> > > > > Signed-off-by: Wang Li <wangli39@xxxxxxxxx>
> > > > > Signed-off-by: Zhang Yu <zhangyu31@xxxxxxxxx>
> > > > > Signed-off-by: Li RongQing <lirongqing@xxxxxxxxx>
> > > > > ---
> > > > > V4: add version information
> > > > > V3: not used ldisc semaphore lock, only checking tty->driver_data
> > > > > with NULL
> > > > > V2: fix building error by EXPORT_SYMBOL tty_ldisc_unlock
> > > > > V1: extend ldisc lock to protect that tty->driver_data is inited
> > > > >
> > > > > drivers/tty/tty_port.c | 3 +++
> > > > > 1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/drivers/tty/tty_port.c b/drivers/tty/tty_port.c index
> > > > > 044c3cbdcfa4..86d0bec38322 100644
> > > > > --- a/drivers/tty/tty_port.c
> > > > > +++ b/drivers/tty/tty_port.c
> > > > > @@ -31,6 +31,9 @@ static int tty_port_default_receive_buf(struct
> > > > > tty_port
> > > > *port,
> > > > > if (!tty)
> > > > > return 0;
> > > > >
> > > > > + if (!tty->driver_data)
> > > > > + return 0;
> > > > > +
> > > >
> > > > How is this working? What is setting driver_data to NULL to "stop" this
> > race?
> > > >
> > >
> > >
> > > if tty->driver_data is NULL and return, tty_port_default_receive_buf
> > > will not step to uart_start which access tty->driver_data and trigger
> > > panic before tty_open, so it can fix the system panic
> > >
> > > > There's no requirement that a tty driver set this field to NULL when it is
> > "done"
> > > > with the tty device, so I think you are just getting lucky in that
> > > > your specific driver happens to be doing this.
> > > >
> > >
> > > when tty_open is running, tty is allocated by kzalloc in tty_init_dev
> > > which called by tty_open_by_driver, tty is inited to 0
> > >
> > > > What driver are you testing this against?
> > > >
> > >
> > > 8250
> >
> > Ok, as this is specific to the uart core, how about this patch instead:
> >
> > diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> > index 5c01bb6d1c24..b56a6250df3f 100644
> > --- a/drivers/tty/serial/serial_core.c
> > +++ b/drivers/tty/serial/serial_core.c
> > @@ -130,6 +130,9 @@ static void uart_start(struct tty_struct *tty)
> > struct uart_port *port;
> > unsigned long flags;
> >
> > + if (!state)
> > + return;
> > +
> > port = uart_port_lock(state, flags);
> > __uart_start(tty);
> > uart_port_unlock(port, flags);
>
>
> If move the check into uart_start, i am afraid that it maybe not fully fix this issue,
> Since n_tty_receive_buf_common maybe call n_tty_check_throttle/
> tty_unthrottle_safe which maybe use the tty->driver_data
>
> if tty is not fully opened, I think no gain to step into more function

But as I said, the tty core has no knowledge of the "driver_data",
field. It does not know if a driver really is even using that field, so
it means nothing to the tty core, so it can not check it. Your specific
tty driver does happen to use it, so it can check it.

If you also need to check this in unthrottle, how about this patch too?
Does the combination of these two patches solve the problem for your
systems?

thanks,

greg k-h


diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 5c01bb6d1c24..e33d4c181123 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -727,6 +727,9 @@ static void uart_unthrottle(struct tty_struct *tty)
upstat_t mask = UPSTAT_SYNC_FIFO;
struct uart_port *port;

+ if (!state)
+ return;
+
port = uart_port_ref(state);
if (!port)
return;