Re: NMI watchdog dump does not print on hard lockup

From: Petr Mladek
Date: Fri Oct 13 2017 - 07:14:52 EST

Next message: Arnd Bergmann: "Re: [PATCH] spi-nor: intel-spi: Fix Kconfig dependency to LPC_ICH"
Previous message: Masahiro Yamada: "Re: [PATCH] kbuild: shrink Makefile cache when it exceeds 1000 lines"
In reply to: Peter Zijlstra: "Re: NMI watchdog dump does not print on hard lockup"
Next in thread: Steven Rostedt: "Re: NMI watchdog dump does not print on hard lockup"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu 2017-10-12 12:16:58, Steven Rostedt wrote:
> static void lock_up_cpu(void *data)
> {
> unsigned long flags;
> raw_spin_lock_irqsave(&global_trace.start_lock, flags);
> raw_spin_lock(&global_trace.start_lock);
> raw_spin_unlock(&global_trace.start_lock);
> raw_spin_unlock_irqrestore(&global_trace.start_lock, flags);
> }
>
> [..]
>
> on_each_cpu(lock_up_cpu, NULL, 1);
>
> This too triggered the warning. But I noticed that the calling function
> didn't hard lockup. (Not all CPUs were hard locked).
>
> Finally I did:
>
> on_each_cpu(lock_up_cpu, NULL, 0);
> lock_up_cpu(tr);
>
> And boom! It locked up (lockdep was enabled, so I could see it showing
> the deadlock), but then it stopped there. No output. The NMI watchdog
> will only detect hard lockups if there is at least one CPU that is
> still active. This could be an issue on non SMP boxes.
>
> We need a way to have NMI flush to consoles when a lockup is detected,
> and not depend on an irq_work to do so.

I thought that enabling CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE
could help. panic() flushes the printk_save buffers, see
printk_safe_flush_on_panic(). But it somehow does not help.
I need to dig more into it.

In general, we could either improve detection of situations when
the entire system is locked. It would be a reason to risk calling
consoles even in NMI.

Or we could accept that the "default" printk is not good for all
situations and allow more special "debugging" modes:

+ Peter's force_early_printk stuff

+ Allow to disable printk_safe and printk_safe_nmi.
There will be a risk of a deadlock caused by printk.
But there also will be a chance to see the messages.

Best Regards,
Petr

Next message: Arnd Bergmann: "Re: [PATCH] spi-nor: intel-spi: Fix Kconfig dependency to LPC_ICH"
Previous message: Masahiro Yamada: "Re: [PATCH] kbuild: shrink Makefile cache when it exceeds 1000 lines"
In reply to: Peter Zijlstra: "Re: NMI watchdog dump does not print on hard lockup"
Next in thread: Steven Rostedt: "Re: NMI watchdog dump does not print on hard lockup"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]