答复: [PATCH] TTY: serial_core, add ->install
From: Li,Rongqing
Date: Wed Apr 17 2019 - 08:06:26 EST
> -----邮件原件-----
> 发件人: linux-kernel-owner@xxxxxxxxxxxxxxx
> [mailto:linux-kernel-owner@xxxxxxxxxxxxxxx] 代表 Jiri Slaby
> 发送时间: 2019年4月17日 16:59
> 收件人: gregkh@xxxxxxxxxxxxxxxxxxx
> 抄送: linux-serial@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Jiri Slaby
> <jslaby@xxxxxxx>; Li,Rongqing <lirongqing@xxxxxxxxx>; Wang,Li(ACG Cloud)
> <wangli39@xxxxxxxxx>; Zhang,Yu(ACG Cloud) <zhangyu31@xxxxxxxxx>;
> stable <stable@xxxxxxxxxxxxxxx>
> 主题: [PATCH] TTY: serial_core, add ->install
>
> We need to compute the uart state only on the first open. This is usually what is
> done in the ->install hook. serial_core used to do this in ->open on every open.
> So move it to ->install.
>
> As a side effect, it ensures the state is set properly in the window after
> tty_init_dev is called, but before uart_open. This fixes a bunch of races
> between tty_open and flush_to_ldisc we were dealing with recently.
>
> One of such bugs was attempted to fix in commit fedb5760648a (serial:
> fix race between flush_to_ldisc and tty_open), but it only took care of a couple
> of functions (uart_start and uart_unthrottle). I was able to reproduce the
> crash on a SLE system, but in uart_write_room which is also called from
> flush_to_ldisc via process_echoes. I was *unable* to reproduce the bug locally.
> It is due to having this patch in my queue since 2012!
>
> general protection fault: 0000 [#1] SMP KASAN PTI
> CPU: 1 PID: 5 Comm: kworker/u4:0 Tainted: G L
> 4.12.14-396-default #1 SLE15-SP1 (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014
> Workqueue: events_unbound flush_to_ldisc
> task: ffff8800427d8040 task.stack: ffff8800427f0000
> RIP: 0010:uart_write_room+0xc4/0x590
> RSP: 0018:ffff8800427f7088 EFLAGS: 00010202
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 000000000000002f RSI: 00000000000000ee RDI: ffff88003888bd90
> RBP: ffffffffb9545850 R08: 0000000000000001 R09: 0000000000000400
> R10: ffff8800427d825c R11: 000000000000006e R12: 1ffff100084fee12
> R13: ffffc900004c5000 R14: ffff88003888bb28 R15: 0000000000000178
> FS: 0000000000000000(0000) GS:ffff880043300000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000561da0794148 CR3: 000000000ebf4000 CR4: 00000000000006e0
> Call Trace:
> tty_write_room+0x6d/0xc0
> __process_echoes+0x55/0x870
> n_tty_receive_buf_common+0x105e/0x26d0
> tty_ldisc_receive_buf+0xb7/0x1c0
> tty_port_default_receive_buf+0x107/0x180
> flush_to_ldisc+0x35d/0x5c0
> ...
>
> 0 in rbx means tty->driver_data is NULL in uart_write_room. 0x178 is tried to
> be dereferenced (0x178 >> 3 is 0x2f in rdx) at uart_write_room+0xc4. 0x178 is
> exactly (struct uart_state *)NULL->refcount used in uart_port_lock from
> uart_write_room.
>
> So revert the upstream commit here as my local patch should fix the whole
> family.
>
> Signed-off-by: Jiri Slaby <jslaby@xxxxxxx>
> Cc: Li RongQing <lirongqing@xxxxxxxxx>
> Cc: Wang Li <wangli39@xxxxxxxxx>
> Cc: Zhang Yu <zhangyu31@xxxxxxxxx>
> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> Cc: stable <stable@xxxxxxxxxxxxxxx>
> ---
>
> ============================= NOTE =============================
>
> Could you test your use-case at Baidu, guys, please?
>
Sorry, we have not the environment to test it, it happens when we upgrades BMC
-RongQing