[PATCH] panic: disable optimistic spin after halting CPUs

From: Stephen Brennan
Date: Wed Jan 12 2022 - 19:55:16 EST


A CPU executing with console lock spinning enabled might be halted
during a panic. Before console_flush_on_panic(), it's possible for
console_trylock() to attempt optimistic spinning, deadlocking the panic
CPU:

CPU 0 (panic CPU) CPU 1
----------------- ------
printk() {
vprintk_func() {
vprintk_default() {
vprintk_emit() {
console_unlock() {
console_lock_spinning_enable();
... printing to console ...
panic() {
crash_smp_send_stop() {
NMI -------------------> HALT
}
atomic_notifier_call_chain() {
printk() {
...
console_trylock_spinnning() {
// optimistic spin infinitely

This hang during panic can be induced when a kdump kernel is loaded, and
crash_kexec_post_notifiers=1 is present on the kernel command line. The
following script which concurrently writes to /dev/kmsg, and triggers a
panic, can result in this hang:

#!/bin/bash
date
# 991 chars (based on log buffer size):
chars="$(printf 'a%.0s' {1..991})"
while :; do
echo $chars > /dev/kmsg
done &
echo c > /proc/sysrq-trigger &
date
exit

Below are the hang rates for the above test case, based on v5.16-rc8
before and after this patch:

before: 197 hangs / 340 trials - 57.9%
after : 0 hangs / 245 trials - 0.0%

Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes")

Signed-off-by: Stephen Brennan <stephen.s.brennan@xxxxxxxxxx>
Reviewed-by: Junxiao Bi <junxiao.bi@xxxxxxxxxx>
---
include/linux/console.h | 1 +
kernel/panic.c | 7 +++++++
kernel/printk/printk.c | 17 +++++++++++++++++
3 files changed, 25 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index a97f277cfdfa..4eeb46927d96 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -179,6 +179,7 @@ extern void console_unlock(void);
extern void console_conditional_schedule(void);
extern void console_unblank(void);
extern void console_flush_on_panic(enum con_flush_mode mode);
+extern void console_lock_spinning_disable_on_panic(void);
extern struct tty_driver *console_device(int *);
extern void console_stop(struct console *);
extern void console_start(struct console *);
diff --git a/kernel/panic.c b/kernel/panic.c
index cefd7d82366f..a9340e580b20 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -265,6 +265,13 @@ void panic(const char *fmt, ...)
crash_smp_send_stop();
}

+ /*
+ * Now that we've halted other CPUs, disable optimistic spinning in
+ * printk(). This avoids deadlocking in console_trylock(), until we call
+ * console_flush_on_panic().
+ */
+ console_lock_spinning_disable_on_panic();
+
/*
* Run any panic handlers, including those that might need to
* add information to the kmsg dump output.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 57b132b658e1..d824fdf7d3fd 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1823,6 +1823,23 @@ static int console_lock_spinning_disable_and_check(void)
return 1;
}

+/**
+ * console_lock_spinning_disable_on_panic - disable spinning so that
+ * a panic CPU does not enter an infinite loop
+ *
+ * This is called once all CPUs are halted. A CPU halted during a section which
+ * allowed spinning, could trigger an infinite loop in console_trylock. To avoid
+ * this, mark console_owner as NULL.
+ */
+void console_lock_spinning_disable_on_panic(void)
+{
+ WRITE_ONCE(console_owner, NULL);
+ if (raw_spin_is_locked(&console_owner_lock)) {
+ debug_locks_off();
+ raw_spin_lock_init(&console_owner_lock);
+ }
+}
+
/**
* console_trylock_spinning - try to get console_lock by busy waiting
*
--
2.30.2