Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected

From: Kirill Tkhai
Date: Tue Feb 11 2020 - 03:15:13 EST


Hi, Konstantin,

On 10.02.2020 12:48, Konstantin Khlebnikov wrote:
> In NMI context printk() could save messages into per-cpu buffers and
> schedule flush by irq_work when IRQ are unblocked. This means message
> about hardlockup appears in kernel log only when/if lockup is gone.
>
> Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> thus printk() cannot schedule flush work to another cpu.
>
> This patch adds simple atomic counter of detected hardlockups and
> flushes all per-cpu printk buffers in context softlockup watchdog
> at any other cpu when it sees changes of this counter.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>
> ---
> include/linux/nmi.h | 1 +
> kernel/watchdog.c | 22 ++++++++++++++++++++++
> kernel/watchdog_hld.c | 1 +
> 3 files changed, 24 insertions(+)
>
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 9003e29cde46..8406df72ae5a 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -84,6 +84,7 @@ static inline void reset_hung_task_detector(void) { }
> #if defined(CONFIG_HARDLOCKUP_DETECTOR)
> extern void hardlockup_detector_disable(void);
> extern unsigned int hardlockup_panic;
> +extern atomic_t hardlockup_detected;
> #else
> static inline void hardlockup_detector_disable(void) {}
> #endif
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index b6b1f54a7837..9f5c68fababe 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
> }
> __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
> # endif /* CONFIG_SMP */
> +
> +atomic_t hardlockup_detected = ATOMIC_INIT(0);
> +
> +static inline void flush_hardlockup_messages(void)
> +{
> + static atomic_t flushed = ATOMIC_INIT(0);
> +
> + /* flush messages from hard lockup detector */
> + if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
> + atomic_set(&flushed, atomic_read(&hardlockup_detected));
> + printk_safe_flush();
> + }
> +}

Do we really need two variables here? They may come into two different
cache lines, and there will be double cache pollution just because of
this simple check. Why not the below?

if (atomic_read(&hardlockup_detected)) {
atomic_set(&hardlockup_detected, 0);
printk_safe_flush();
}

Or even, since atomic is not needed here, as it does not give any ordering guarantees.
static inline void flush_hardlockup_messages(void)
{
if (READ_ONCE(&hardlockup_detected)) {
WRITE_ONCE(&hardlockup_detected, 0);
printk_safe_flush();
}
}

watchdog_timer_fn()
{
...
WRITE_ONCE(&hardlockup_detected, 1);
...
}

Kirill