Hi, Konstantin,
On 10.02.2020 12:48, Konstantin Khlebnikov wrote:
In NMI context printk() could save messages into per-cpu buffers and
schedule flush by irq_work when IRQ are unblocked. This means message
about hardlockup appears in kernel log only when/if lockup is gone.
Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
thus printk() cannot schedule flush work to another cpu.
This patch adds simple atomic counter of detected hardlockups and
flushes all per-cpu printk buffers in context softlockup watchdog
at any other cpu when it sees changes of this counter.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>
---
include/linux/nmi.h | 1 +
kernel/watchdog.c | 22 ++++++++++++++++++++++
kernel/watchdog_hld.c | 1 +
3 files changed, 24 insertions(+)
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 9003e29cde46..8406df72ae5a 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -84,6 +84,7 @@ static inline void reset_hung_task_detector(void) { }
#if defined(CONFIG_HARDLOCKUP_DETECTOR)
extern void hardlockup_detector_disable(void);
extern unsigned int hardlockup_panic;
+extern atomic_t hardlockup_detected;
#else
static inline void hardlockup_detector_disable(void) {}
#endif
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b6b1f54a7837..9f5c68fababe 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
}
__setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
# endif /* CONFIG_SMP */
+
+atomic_t hardlockup_detected = ATOMIC_INIT(0);
+
+static inline void flush_hardlockup_messages(void)
+{
+ static atomic_t flushed = ATOMIC_INIT(0);
+
+ /* flush messages from hard lockup detector */
+ if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
+ atomic_set(&flushed, atomic_read(&hardlockup_detected));
+ printk_safe_flush();
+ }
+}
Do we really need two variables here? They may come into two different
cache lines, and there will be double cache pollution just because of
this simple check. Why not the below?
if (atomic_read(&hardlockup_detected)) {
atomic_set(&hardlockup_detected, 0);
printk_safe_flush();
}
Or even, since atomic is not needed here, as it does not give any ordering guarantees.
static inline void flush_hardlockup_messages(void)
{
if (READ_ONCE(&hardlockup_detected)) {
WRITE_ONCE(&hardlockup_detected, 0);
printk_safe_flush();
}
}
watchdog_timer_fn()
{
...
WRITE_ONCE(&hardlockup_detected, 1);
...
}
Kirill