Re: linux-next 20170519 and later - ^S/^Q borkage on ttys.

From: Vegard Nossum
Date: Wed May 31 2017 - 03:28:42 EST


On 05/31/17 05:48, valdis.kletnieks@xxxxxx wrote:
Pretty drastic. Hit ^S to pause scrolling, and instantly hung terminal.
Seen on both urxvt and xterm under x11, and on virtual console screens.

This appears in dmesg:

[ 1844.182058] INFO: task kworker/u8:3:129 blocked for more than 120 seconds.
[ 1844.182073] Tainted: G OE 4.12.0-rc3-next-20170530 #489
[ 1844.182078] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1844.182085] kworker/u8:3 D11008 129 2 0x00000000
[ 1844.182109] Workqueue: events_unbound flush_to_ldisc
[ 1844.182118] Call Trace:
[ 1844.182136] __schedule+0x43e/0x1020
[ 1844.182147] ? schedule_preempt_disabled+0x27/0xd0
[ 1844.182156] schedule+0x5d/0x1d0
[ 1844.182164] ? __mutex_lock+0x4c9/0x11c0
[ 1844.182172] schedule_preempt_disabled+0x27/0xd0
[ 1844.182179] __mutex_lock+0x4c9/0x11c0
[ 1844.182191] ? tty_port_default_receive_buf+0x58/0xc0
[ 1844.182204] ? ldsem_down_read_trylock+0xc3/0x130
[ 1844.182215] mutex_lock_nested+0x1b/0x20
[ 1844.182222] ? mutex_lock_nested+0x1b/0x20
[ 1844.182230] tty_port_default_receive_buf+0x58/0xc0
[ 1844.182240] flush_to_ldisc+0xea/0x220
[ 1844.182249] ? trace_hardirqs_on_caller+0x16/0x290
[ 1844.182262] process_one_work+0x3d6/0xd00
[ 1844.182269] ? lock_acquire+0xae/0x2f0
[ 1844.182284] worker_thread+0x71/0x830
[ 1844.182297] kthread+0x1a9/0x270
[ 1844.182304] ? process_one_work+0xd00/0xd00
[ 1844.182310] ? kthread_create_on_node+0x70/0x70
[ 1844.182321] ret_from_fork+0x27/0x40
[ 1844.182608] INFO: lockdep is turned off.

Bisects down to this commit, and things work when it's reverted.

Commit 925bb1ce47f4.
Author: Vegard Nossum <vegard.nossum@xxxxxxxxxx>
Date: Thu May 11 12:18:52 2017 +0200

tty: fix port buffer locking

tty_insert_flip_string_fixed_flag() is racy against itself when called
from the ioctl(TCXONC, TCION/TCIOFF) path [1] and the flush_to_ldisc()
workqueue path [2].

Gah, if it's that easy to trigger a deadlock (as opposed to just a
lockdep warning), we should revert the patch until I have a better fix.

^S doesn't seem to reproduce it here, though. Too bad your stack trace
doesn't show the process already holding the lock.


Vegard