Re: [PATCH] tty: fix data race in n_tty_receive_buf_common

From: Alan Cox
Date: Thu Jan 04 2018 - 06:10:18 EST


> > What does a full (all CPU) trace of the bug look like and what tty driver
> > are you using when you capture the trace ?

Which tty driver ? serial/msm_serial.c ?

> We are using tty for console logging,
> ÂÂÂ |ÂÂÂ tty = 0xFFFFFFFF477AC880 -> (
> ÂÂÂ |ÂÂÂÂÂ magic = 21505,
> ÂÂÂ |ÂÂÂÂÂ kref = (refcount = (counter = 2)),
> ÂÂÂ |ÂÂÂÂÂ dev = 0xFFFFFFFFEDE3DA80,
> ÂÂÂ |ÂÂÂÂÂ driver = 0xFFFFFFFFEDE2A480,
> ÂÂÂ |ÂÂÂÂÂ ops = 0xFFFFFF9F26F7D0D0,
> ÂÂÂ |ÂÂÂÂÂ index = 0,
> ÂÂÂ |ÂÂÂÂÂ ldisc_sem = (count = 1, wait_lock = (raw_lock = (owner = 0,
> next = 0), magic = 3735899821, own
> ÂÂÂ |ÂÂÂÂÂ termiox = 0x0,
> ÂÂÂ |ÂÂÂÂÂ name = "ttyMSM0",
> ÂÂÂ |ÂÂÂÂÂ pgrp = 0x0,

Ok no what I need to see is a trace of what each CPU is doing at the
point you detect the problem. That way we can see what the path that
races is.

> We have seen this issue on 4.9 and also one thing i have observed,
> before tty is getting reinit in tty_init_dev(),

When yo stop the DMA is it instantaneous or does it cause a final
interrupt after you return from stop_rx ?

To me it still looks like data is being queued after the port is told to
stop but that's not a certainty.

> there is console service exited before it in all the dumps.
> Â35206.969644:ÂÂ <2> init: Service 'console' (pid 7440) exited with
> status 130
> Â35206.969690:ÂÂ <2> init: Sending signal 9 to service 'console' (pid
> 7440) process group...
> Â35206.970857:ÂÂ <2> init: kill(7440, 9) failed: No such process.
>
> So how can we stop request of receive buff, if we already have tty_port
> and tty is getting reinitialized in midway like above
> case?

Is the port your console device. If you use a different port as a console
device does the problem go away - that could be a very important detail
as the hangup behaviour for the two is quite different.

Alan