Re: [PATCH v3 4/4] panic: use sys_info_with_filter() to avoid duplicate backtraces

From: Bradley Morgan

Date: Fri Jun 26 2026 - 08:33:01 EST

On June 26, 2026 1:17:13 PM GMT+01:00, Bradley Morgan <include@xxxxxxxxx>
wrote:
>On June 26, 2026 1:14:14 PM GMT+01:00, Petr Mladek <pmladek@xxxxxxxx>
>wrote:
>>On Fri 2026-06-26 12:23:50, Petr Mladek wrote:
>>> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote:
>>> > panic_other_cpus_shutdown() handles SYS_INFO_ALL_BT before stopping
>>the
>>> > other CPUs. Do not ask sys_info() to handle that bit again later in
>>the
>>> > panic path.
>>> >
>>> > Use sys_info_with_filter() so panic_print=all_bt does not request
>more
>>> > output after the CPUs are stopped.
>>> >
>>> > Fixes: a9af76a78760 ("watchdog: add sys_info sysctls to dump sys info
>>on system lockup")
>>> > Cc: stable@xxxxxxxxxxxxxxx
>>> > Signed-off-by: Bradley Morgan <include@xxxxxxxxx>
>>> > ---
>>> > kernel/panic.c | 2 +-
>>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>>> >
>>> > diff --git a/kernel/panic.c b/kernel/panic.c
>>> > index 213725b612aa..eb842823df61 100644
>>> > --- a/kernel/panic.c
>>> > +++ b/kernel/panic.c
>>> > @@ -680,7 +680,7 @@ void vpanic(const char *fmt, va_list args)
>>> > */
>>> > atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>>> >
>>> > - sys_info(panic_print);
>>> > + sys_info_with_filter(panic_print, SYS_INFO_ALL_BT);
>>>
>>> Hmm, this prevents printing backtraces from all CPUs completely.
>>> But what if they were not printed?
>>>
>>> They might be printed by:
>>>
>>> static void panic_other_cpus_shutdown(bool crash_kexec)
>>> {
>>> if (panic_print & SYS_INFO_ALL_BT)
>>> panic_trigger_all_cpu_backtrace();
>>>
>>> [...]
>>> }
>>>
>>> But it checks only "panic_print" variable. It won't do anything
>>> when (panic_print == 0).
>>>
>>> In this case, we might still want to print the backraces when
>>> SYS_INFO_ALL_BT is set in kernel_si_info.
>>>
>>> > kmsg_dump_desc(KMSG_DUMP_PANIC, buf);
>>>
>>> Of course, we might fix panic_other_cpus_shutdown() to check also
>>> kernel_si_info.
>>>
>>> But it all becomes very hairy. We have several levels:
>>>
>>> + watchdog-all_bt-specific option, e.g.
>>sysctl_hardlockup_all_cpu_backtrace
>>>
>>> + watchdog-specific si_info preferences, e.g. hardlockup_si_mask
>>>
>>> + panic-specific si_info: panic_print
>>>
>>> + universal fallback for any layer: kernel_si_info
>>>
>>> Now, we try to check all these variables back and forth to
>>> trigger all backtraces or to avoid triggering them.
>>> And it clearly does not work well and the code is more and more
>>> hairy.
>>>
>>> I think about another approach. The word "waterfall" comes to my mind.
>>> Instead of checking all the settings back and forth, let's process
>>> each setting one by one and just remember what has been done and
>>> skip this in the next level.
>>>
>>> All the si_info actions seems to dump a global system state.
>>> So, it would make sense to remember the state in a global variable
>>> even when it might be modified by more CPUs in parallel.
>>>
>>> I am going to think more about it.
>>
>>I have created a POC using Gemini. I haven't tested it.
>>But it looks acceptable. And the logic seems to be more
>>straightforward.
>>
>>One drawback is that it requires adding the _reset()
>>call for all sys_info() callers. It is fine in principle
>>but it might complicate back-porting because all changes
>>have to be done in one patch.
>>
>>But honestly, this is a nice to have fix. Most people could
>>live happily without it.
>>
>>From 3c66436d9978030845a96bfaedd6b914536e2ac4 Mon Sep 17 00:00:00 2001
>>From: Petr Mladek <pmladek@xxxxxxxx>
>>Date: Fri, 26 Jun 2026 13:55:41 +0200
>>Subject: [POC] sys_info: Introduce state-tracking APIs to prevent
>duplicate
>> backtraces
>>
>>In watchdog, panic, and hung task detection scenarios, sys_info() can
>>be called multiple times or alongside direct backtrace triggers like
>>trigger_allbutcpu_cpu_backtrace(). This results in identical backtraces
>>being dumped repeatedly from all CPUs, cluttering the kernel log and
>>delaying or obscuring critical debug details.
>>
>>Introduce a state tracking bitmask and associated helpers:
>>- sys_info_done(mask): Marks specific sys_info bits as already printed.
>>- sys_info_reset(): Resets the tracking state.
>>- sys_info_is_done(mask): Checks if all bits in the mask have been
>printed.
>>
>>Update sys_info() to automatically filter out already printed bits
>>using this state. Integrate these APIs with the generic hardlockup
>>and softlockup watchdogs, the PowerPC watchdog, the hung task detector,
>>and the panic core. This ensures that each piece of system information
>>and backtrace output is printed at most once per lockup/panic event,
>>and the state is reset cleanly when a lockup does not trigger a panic.
>>
>>Races between sys_info() callers are ignored. It should be acceptable
>>because the output from various watchdogs has never been synchronized.
>>And panic() never returns.
>>
>>Assisted-by: gemini-1.5-flash ?
>
>Why not use gemini 3.5 flash?
>
>I can try if you want.
>
>Could I have the prompt you used? :)
>
>>Signed-off-by: Petr Mladek <pmladek@xxxxxxxx>
>>---
>> arch/powerpc/kernel/watchdog.c | 13 ++++++++++---
>> include/linux/sys_info.h | 3 +++
>> kernel/hung_task.c | 2 ++
>> kernel/panic.c | 4 +++-
>> kernel/watchdog.c | 10 ++++++++--
>> lib/sys_info.c | 30 +++++++++++++++++++++++++++++-
>> 6 files changed, 55 insertions(+), 7 deletions(-)
>>
>>diff --git a/arch/powerpc/kernel/watchdog.c
>b/arch/powerpc/kernel/watchdog.c
>>index c40c69368476..0eab7894b9dc 100644
>>--- a/arch/powerpc/kernel/watchdog.c
>>+++ b/arch/powerpc/kernel/watchdog.c
>>@@ -239,6 +239,7 @@ static void watchdog_smp_panic(int cpu)
>> if (sysctl_hardlockup_all_cpu_backtrace ||
>> (hardlockup_si_mask & SYS_INFO_ALL_BT)) {
>> trigger_allbutcpu_cpu_backtrace(cpu);
>>+ sys_info_done(SYS_INFO_ALL_BT);
>> cpumask_clear(&wd_smp_cpus_ipi);
>> } else {
>> /*
>>@@ -251,10 +252,12 @@ static void watchdog_smp_panic(int cpu)
>> }
>> }
>>
>>- sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
>>+ sys_info(hardlockup_si_mask);
>> if (hardlockup_panic)
>> nmi_panic(NULL, "Hard LOCKUP");
>>
>>+ sys_info_reset();
>>+
>> wd_end_reporting();
>>
>> return;
>>@@ -419,13 +422,17 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
>> xchg(&__wd_nmi_output, 1); // see wd_lockup_ipi
>>
>> if (sysctl_hardlockup_all_cpu_backtrace ||
>>- (hardlockup_si_mask & SYS_INFO_ALL_BT))
>>+ (hardlockup_si_mask & SYS_INFO_ALL_BT)) {
>> trigger_allbutcpu_cpu_backtrace(cpu);
>>+ sys_info_done(SYS_INFO_ALL_BT);
>>+ }
>>
>>- sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
>>+ sys_info(hardlockup_si_mask);
>> if (hardlockup_panic)
>> nmi_panic(regs, "Hard LOCKUP");
>>
>>+ sys_info_reset();
>>+
>> wd_end_reporting();
>> }
>> /*
>>diff --git a/include/linux/sys_info.h b/include/linux/sys_info.h
>>index a5bc3ea3d44b..ad43548c75dd 100644
>>--- a/include/linux/sys_info.h
>>+++ b/include/linux/sys_info.h
>>@@ -18,6 +18,9 @@
>> #define SYS_INFO_BLOCKED_TASKS 0x00000080
>>
>> void sys_info(unsigned long si_mask);
>>+void sys_info_done(unsigned long si_mask);
>>+void sys_info_reset(void);
>>+bool sys_info_is_done(unsigned long si_mask);
>> unsigned long sys_info_parse_param(char *str);
>>
>> #ifdef CONFIG_SYSCTL
>>diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>>index 6fcc94ce4ca9..dbb6a27770f5 100644
>>--- a/kernel/hung_task.c
>>+++ b/kernel/hung_task.c
>>@@ -354,6 +354,8 @@ static void check_hung_uninterruptible_tasks(unsigned
>long timeout)
>>
>> if (hung_task_call_panic)
>> panic("hung_task: blocked tasks");
>>+
>>+ sys_info_reset();
>> }
>>
>> static long hung_timeout_jiffies(unsigned long last_checked,
>>diff --git a/kernel/panic.c b/kernel/panic.c
>>index 213725b612aa..86ce17f03da2 100644
>>--- a/kernel/panic.c
>>+++ b/kernel/panic.c
>>@@ -550,8 +550,10 @@ static void panic_trigger_all_cpu_backtrace(void)
>> */
>> static void panic_other_cpus_shutdown(bool crash_kexec)
>> {
>>- if (panic_print & SYS_INFO_ALL_BT)
>>+ if ((panic_print & SYS_INFO_ALL_BT) && !sys_info_is_done(SYS_INFO_ALL_BT)) {
>> panic_trigger_all_cpu_backtrace();
>>+ sys_info_done(SYS_INFO_ALL_BT);
>>+ }
>>
>> /*
>> * Note that smp_send_stop() is the usual SMP shutdown function,
>>diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>>index 87dd5e0f6968..f431087c68a7 100644
>>--- a/kernel/watchdog.c
>>+++ b/kernel/watchdog.c
>>@@ -282,14 +282,17 @@ void watchdog_hardlockup_check(unsigned int cpu,
>struct pt_regs *regs)
>>
>> if (hardlockup_all_cpu_backtrace) {
>> trigger_allbutcpu_cpu_backtrace(cpu);
>>+ sys_info_done(SYS_INFO_ALL_BT);
>> if (!hardlockup_panic)
>> clear_bit_unlock(0, &hard_lockup_nmi_warn);
>> }
>>
>>- sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
>>+ sys_info(hardlockup_si_mask);
>> if (hardlockup_panic)
>> nmi_panic(regs, "Hard LOCKUP");
>>
>>+ sys_info_reset();
>>+
>> per_cpu(watchdog_hardlockup_warned, cpu) = true;
>> }
>>
>>@@ -895,16 +898,19 @@ static enum hrtimer_restart
>watchdog_timer_fn(struct hrtimer *hrtimer)
>>
>> if (softlockup_all_cpu_backtrace) {
>> trigger_allbutcpu_cpu_backtrace(smp_processor_id());
>>+ sys_info_done(SYS_INFO_ALL_BT);
>> if (!softlockup_panic)
>> clear_bit_unlock(0, &soft_lockup_nmi_warn);
>> }
>>
>> add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
>>- sys_info(softlockup_si_mask & ~SYS_INFO_ALL_BT);
>>+ sys_info(softlockup_si_mask);
>> thresh_count = duration / get_softlockup_thresh();
>>
>> if (softlockup_panic && thresh_count >= softlockup_panic)
>> panic("softlockup: hung tasks");
>>+
>>+ sys_info_reset();
>> }
>>
>> return HRTIMER_RESTART;
>>diff --git a/lib/sys_info.c b/lib/sys_info.c
>>index f32a06ec9ed4..f8e6176fae75 100644
>>--- a/lib/sys_info.c
>>+++ b/lib/sys_info.c
>>@@ -160,7 +160,35 @@ static void __sys_info(unsigned long si_mask)
>> show_state_filter(TASK_UNINTERRUPTIBLE);
>> }
>>
>>+static unsigned long sys_info_done_mask;
>>+
>>+void sys_info_done(unsigned long si_mask)
>>+{
>>+ sys_info_done_mask |= si_mask;
>>+}
>>+
>>+void sys_info_reset(void)
>>+{
>>+ sys_info_done_mask = 0;
>>+}
>>+
>>+bool sys_info_is_done(unsigned long si_mask)
>>+{
>>+ return (sys_info_done_mask & si_mask) == si_mask;
>>+}
>>+
>> void sys_info(unsigned long si_mask)
>> {
>>- __sys_info(si_mask ? : kernel_si_mask);
>>+ unsigned long mask;
>>+
>>+ if (si_mask)
>>+ mask = si_mask & ~sys_info_done_mask;
>>+ else
>>+ mask = kernel_si_mask & ~sys_info_done_mask;
>>+
>>+ if (!mask)
>>+ return;
>>+
>>+ __sys_info(mask);
>>+ sys_info_done(mask);
>> }
>>
>
>Thanks!

Hmm.. new idea

kernel/dump_filter.c ?

What this file could do is to handle a generic lockup state machine
so any subsystem can log what it already dumped?

I know it may bloat, but it's better then cramming fixes in.

What do you guys think? Maybe we could start a RFC for this?

Thanks!