Re: [PATCH] printk: CPU backtrace not printing on panic

From: John Ogness
Date: Fri Jul 26 2024 - 09:56:56 EST


On 2024-07-26, Petr Mladek <pmladek@xxxxxxxx> wrote:
> I would do it the other way and enable printing from other CPUs only
> when triggring the backtrace. We could do it because
> trigger_all_cpu_backtrace() waits until all backtraces are
> printed.
>
> Something like:
>
> diff --git a/include/linux/panic.h b/include/linux/panic.h
> index 3130e0b5116b..980bacbdfcfc 100644
> --- a/include/linux/panic.h
> +++ b/include/linux/panic.h
> @@ -16,6 +16,7 @@ extern void oops_enter(void);
> extern void oops_exit(void);
> extern bool oops_may_print(void);
>
> +extern int panic_triggering_all_cpu_backtrace;
> extern int panic_timeout;
> extern unsigned long panic_print;
> extern int panic_on_oops;
> diff --git a/kernel/panic.c b/kernel/panic.c
> index f861bedc1925..7e9e97d59b1e 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -64,6 +64,8 @@ unsigned long panic_on_taint;
> bool panic_on_taint_nousertaint = false;
> static unsigned int warn_limit __read_mostly;
>
> +int panic_triggering_all_cpu_backtrace;
> +
> int panic_timeout = CONFIG_PANIC_TIMEOUT;
> EXPORT_SYMBOL_GPL(panic_timeout);
>
> @@ -253,8 +255,12 @@ void check_panic_on_warn(const char *origin)
> */
> static void panic_other_cpus_shutdown(bool crash_kexec)
> {
> - if (panic_print & PANIC_PRINT_ALL_CPU_BT)
> + if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
> + /* Temporary allow printing messages on non-panic CPUs. */
> + panic_triggering_all_cpu_backtrace = true;
> trigger_all_cpu_backtrace();
> + panic_triggering_all_cpu_backtrace = false;

Note, here we should also add

nbcon_atomic_flush_pending();

Your suggestion allows the other CPUs to dump their backtrace into the
ringbuffer, but they are still forbidden from acquiring the nbcon
console contexts for printing. That is a necessary requirement of
nbcon_waiter_matches().

Or since the cpu_sync is held while printing the backtrace, we could
allow the non-panic CPUs to print by modifying the check in
nbcon_context_try_acquire_direct():

----- BEGIN -----
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index ef6e76db0f5a..cd8724840edc 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -241,7 +241,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
struct nbcon_state new;

do {
- if (other_cpu_in_panic())
+ if (other_cpu_in_panic() && !__printk_cpu_sync_owner())
return -EPERM;

if (ctxt->prio <= cur->prio || ctxt->prio <= cur->req_prio)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 613644c55e54..f486d0b84a84 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -4569,6 +4569,11 @@ void __printk_cpu_sync_wait(void)
}
EXPORT_SYMBOL(__printk_cpu_sync_wait);

+bool __printk_cpu_sync_owner(void)
+{
+ return (atomic_read(&printk_cpu_sync_owner) == raw_smp_processor_id());
+}
+
/**
* __printk_cpu_sync_try_get() - Try to acquire the printk cpu-reentrant
* spinning lock.
----- END -----

> + }
>
> /*
> * Note that smp_send_stop() is the usual SMP shutdown function,
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 054c0e7784fd..c22b07049c38 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2316,7 +2316,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> * non-panic CPUs are generating any messages, they will be
> * silently dropped.
> */
> - if (other_cpu_in_panic())
> + if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
> return 0;

I wonder if it is enough to check if it is holding the cpu_sync. Then we
would not need @panic_triggering_all_cpu_backtrace.

John