Re: Is there any race between flush_to_ldisc() and release_one_tty()?

From: Peter Hurley
Date: Wed Apr 13 2016 - 00:46:28 EST


Hi Prasad,

Thanks for the report.


On 04/12/2016 08:22 AM, Sodagudi Prasad wrote:
>
> Hi All,
>
> It looks like there is race between flush_to_ldisc() and release_one_tty().

Not necessarily.

Driver could have destroyed the port prematurely. Or the driver could have
rescheduled the input kworker after it was cancelled in release_tty().

What driver is this?


> Following crash is observed even after including below change.
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/tty/tty_buffer.c?id=7098296a362a96051fa120abf48f0095818b99cd
> https://lkml.org/lkml/2015/9/1/491
>
>
> [386532.450351] Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b93
> [386532.465217] pgd = ffffffc05bea5000
> [386532.467677] [6b6b6b6b6b6b6b93] *pgd=0000000000000000, *pud=0000000000000000
> [386532.474715] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [386532.480350] Modules linked in: wlan(O) [last unloaded: wlan]
> [386532.486085] CPU: 5 PID: 31970 Comm: kworker/5:1 Tainted: G W O 3.18.24-ga9bbc02-00076-g1434803 #1
> [386532.495885] Hardware name: Qualcomm Technologies, Inc. MSM8953 MTP (DT)
> [386532.502581] Workqueue: events flush_to_ldisc
> [386532.506909] task: ffffffc061620000 ti: ffffffc011efc000 task.ti: ffffffc011efc000
> [386532.514465] PC is at ldsem_down_read_trylock+0x0/0x48
> [386532.519583] LR is at tty_ldisc_ref+0x24/0x4c
> ....
> ....
> ....
> [386533.028262] Process kworker/5:1 (pid: 31970, stack limit = 0xffffffc011efc058)
> [386533.035553] Call trace:
> [386533.038080] [<ffffffc0005092a8>] ldsem_down_read_trylock+0x0/0x48
> [386533.044236] [<ffffffc00050817c>] flush_to_ldisc+0x28/0x124
> [386533.049794] [<ffffffc0000ba32c>] process_one_work+0x238/0x3f0
> [386533.055608] [<ffffffc0000bb160>] worker_thread+0x2f8/0x418
> [386533.061163] [<ffffffc0000bf3e0>] kthread+0xec/0xf8


Why is this trace report elided?

Does this happen on vanilla 3.18.24 kernel? What about a more recent kernel?


> 1) It is not clear how READ_ONCE would fix the race between flush_to_ldisc() and release_one_tty() discussed in https://lkml.org/lkml/2015/9/1/491. could you please provide more information?

It doesn't.

READ_ONCE() only guarantees the compiler won't reload tty from port->itty after
the check for NULL; that in turn guarantees that the kworker hasn't been cancelled
yet (since it's now running) and release_tty() can't advance until this kworker
completes.

IOW, the kworker cancel in release_tty() is what guarantees flush_to_ldisc()
is not concurrent with release_one_tty().

But this doesn't factor in at all in the observed crash.
Although the information provided is quite limited(?!), the tty address itself
is bogus; not that this used to point to a valid-but-now-free tty.

What's most likely happened is the tty port has been freed while its
buffer work was running, which is almost certainly a driver bug.


> 2) Is there any chance that, other core could free tty memory between 442 and 445 lines?

Sure, but I don't think that's what's happened here.
As I wrote above, more likely the port has been freed, so the value
retrieved for port->itty was bogus.

FWIW, the ipwireless tty driver will happily free the tty while it's in use.

And the serial core will let you unload the uart driver and free the tty
ports while the tty is open and in use, as well.


> 434 static void flush_to_ldisc(struct work_struct *work)
> 435 {
> ...
> ...
> ...
> 441 tty = READ_ONCE(port->itty);
> 442 if (tty == NULL)
> 443 return;
> 444
> 445 disc = tty_ldisc_ref(tty);
> 446 if (disc == NULL)
> 447 return;