Re: [PATCH] tty: fix data race in n_tty_receive_buf_common

From: Alan Cox
Date: Thu Jan 04 2018 - 09:37:40 EST


On Thu, 4 Jan 2018 19:16:46 +0530
"Kohli, Gaurav" <gkohli@xxxxxxxxxxxxxx> wrote:

> > Which tty driver ? serial/msm_serial.c ?
>
> We are using our internal driver, msm_geni_serial.c

Can you make that code available otherwise it's impossible to see what
the problem might be.

> >
> > Ok no what I need to see is a trace of what each CPU is doing at the
> > point you detect the problem. That way we can see what the path that
> > races is.
> Below is stack trace running by init in our case on one core
> -006|n_tty_open(
> ÂÂÂ |ÂÂÂ tty = 0xFFFFFFFF477AC880 -> (
> ÂÂÂ |ÂÂÂÂÂ disc_data = 0xFFFFFF80197AD000,
>
> ÂÂÂ |ÂÂÂÂÂ port = 0xFFFFFFFFEDE40000))
> ÂÂÂ |Â ldata = 0xFFFFFF80197AD000
>
> ÂÂÂ |Â trace_printk_fmt = 0xFFFFFF9F275125F8
> -007|tty_ldisc_open.isra.3(
> ÂÂÂ |ÂÂÂ tty = 0xFFFFFFFF477AC880)
> -008|tty_ldisc_setup(
>
> -009|tty_init_dev(
> ÂÂÂ |ÂÂÂ driver = 0xFFFFFFFFEDE2A480,
> ÂÂÂ |ÂÂÂ idx = 0)
>
> -010|tty_open_by_driver(inline)
> -010|tty_open(

So core 1 is opening the tty from user space and that's a normal looking
trace for an open of a port that was closed

>
> Core 2:
> -000|n_tty_receive_buf_common(
> ÂÂÂ |ÂÂÂ tty = 0xFFFFFFFF477AC880,
>
> ÂÂÂ |Â ?)
> ÂÂÂ |Â ldata_=_0x0
> ÂÂÂ |Â __func__ = (110, 95, 116, 116, 121, 95, 114, 101, 99, 101, 105,
> 118, 101, 95, 98, 117, 102, 95, 99, 111, 109, 109, 111, 110, 0)
> ÂÂÂ |Â __u = (__val = 7079195495121566464, __c = (0))
> ÂÂÂ |Â c = 127
> ÂÂÂ |Â ldata = 0xFFFFFFFFF40DF97C
>
> ÂÂÂ |Â c = 0
> ÂÂÂ |Â ldata = 0xFFFFFF9F26F46000
>
> -001|n_tty_receive_buf2(
> ÂÂÂ |ÂÂÂ tty = 0xFFFFFFFF477AC880,
>
> -002|tty_ldisc_receive_buf(inline)
> -002|receive_buf(inline)
> -002|flush_to_ldisc(

This is probably the important bit. As you say we are doing a flush to
ldisc for a port even though it is not open.

That's starting to make more sense. Becausee your driver is the console
tty_port_shutdown doesn't stop everything (so console printk still
works), and that means you can receive data and we have a window on
reopening a tty that is only in use as a console where port->tty is valid
but ldisc is not.

I wonder what Jiri thinks but my first thougt is that tty_init_dev in
fact needs to do

tty_ldisc_lock(tty, 5 * HZ);
tty_ldisc_setup(tty);
tty_ldisc_unlock(tty)

with the relevant error handling so that the flush_to_ldisc waits and
either hits 'no ldisc' or 'ldisc valid'

Alan