Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected

From: Konstantin Khlebnikov
Date: Tue Feb 11 2020 - 06:01:48 EST


On 11/02/2020 01.51, Andrew Morton wrote:
On Mon, 10 Feb 2020 12:48:57 +0300 Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx> wrote:

In NMI context printk() could save messages into per-cpu buffers and
schedule flush by irq_work when IRQ are unblocked. This means message
about hardlockup appears in kernel log only when/if lockup is gone.

I think I understand what this means. The hard lockup detector runs at
NMI time but if it detects a lockup within IRQ context it cannot call
printk, because it's within NMI context, where synchronous printk
doesn't work. Yes?

Yes. Printing from hardlockup watchdog is only special case.
Without it irq-work will flush per-cpu buffer right after enabling irq.


Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
thus printk() cannot schedule flush work to another cpu.

This patch adds simple atomic counter of detected hardlockups and
flushes all per-cpu printk buffers in context softlockup watchdog
at any other cpu when it sees changes of this counter.

And I think this works because the softlockup detector runs within irq
context?

Yes. Softlockuo watchdog is a timer. It could use normal printk and
flush per-cpu buffers. Any periodically executed code could do that
but softlockup is most logical place for that.

There is forward signal from softlockup to hardlockup wathdogs in
per-cpu counter hrtimer_interrupts (increment means cpu in't in
hardlockup). This patch adds backward signal from hardlockup to
softlocup detector that some cpus are in hardlockup.



...

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
}
__setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
# endif /* CONFIG_SMP */
+
+atomic_t hardlockup_detected = ATOMIC_INIT(0);
+
+static inline void flush_hardlockup_messages(void)

I don't think this needs to be inlined?

+{
+ static atomic_t flushed = ATOMIC_INIT(0);
+
+ /* flush messages from hard lockup detector */
+ if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
+ atomic_set(&flushed, atomic_read(&hardlockup_detected));
+ printk_safe_flush();
+ }
+}

Could we add some explanatory comments here? Explain to the reader why
this code exists, what purpose it serves? Basically a micro version of
the above changelog.

Hmm. This seems obvious from names of variables and called function.
Both watchdogs use same patterns: monotonic counter and side variable
with snapshot to detect changes or stalls.



...