On Mon, 10 Feb 2020 12:48:57 +0300 Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx> wrote:
In NMI context printk() could save messages into per-cpu buffers and
schedule flush by irq_work when IRQ are unblocked. This means message
about hardlockup appears in kernel log only when/if lockup is gone.
I think I understand what this means. The hard lockup detector runs at
NMI time but if it detects a lockup within IRQ context it cannot call
printk, because it's within NMI context, where synchronous printk
doesn't work. Yes?
Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
thus printk() cannot schedule flush work to another cpu.
This patch adds simple atomic counter of detected hardlockups and
flushes all per-cpu printk buffers in context softlockup watchdog
at any other cpu when it sees changes of this counter.
And I think this works because the softlockup detector runs within irq
context?
...
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
}
__setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
# endif /* CONFIG_SMP */
+
+atomic_t hardlockup_detected = ATOMIC_INIT(0);
+
+static inline void flush_hardlockup_messages(void)
I don't think this needs to be inlined?
+{
+ static atomic_t flushed = ATOMIC_INIT(0);
+
+ /* flush messages from hard lockup detector */
+ if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
+ atomic_set(&flushed, atomic_read(&hardlockup_detected));
+ printk_safe_flush();
+ }
+}
Could we add some explanatory comments here? Explain to the reader why
this code exists, what purpose it serves? Basically a micro version of
the above changelog.
...