Re: Removal of printk safe buffers delays NMI context printk

From: Petr Mladek
Date: Fri Nov 05 2021 - 12:01:16 EST


On Fri 2021-11-05 11:26:35, Nicholas Piggin wrote:
> printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
>
> printk from NMI context relies on irq work being raised on the local CPU
> to print to console. This can be a problem if the NMI was raised by a
> lockup detector to print lockup stack and regs, because the CPU may not
> enable irqs (because it is locked up).
>
> Introduce printk_flush() that can be called from non-NMI context on
> another CPU to try to get those messages to the console.
>
> Fixes: 93d102f094be ("printk: remove safe buffers")
> Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx>
>
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2206,6 +2206,26 @@ int vprintk_store(int facility, int level,
> return ret;
> }
>
> +void printk_flush(void)
> +{
> + /*
> + * Disable preemption to avoid being preempted while holding
> + * console_sem which would prevent anyone from printing to
> + * console
> + */
> + preempt_disable();
> + /*
> + * Try to acquire and then immediately release the console
> + * semaphore. The release will print out buffers and wake up
> + * /dev/kmsg and syslog() users.
> + */
> + if (console_trylock_spinning())
> + console_unlock();
> + preempt_enable();
> +
> + wake_up_klogd();
> +}
> +
> asmlinkage int vprintk_emit(int facility, int level,
> const struct dev_printk_info *dev_info,
> const char *fmt, va_list args)
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@ -75,6 +75,12 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
> touch_softlockup_watchdog();
> }
>
> + /*
> + * Force flush any remote buffers that might be stuck in IRQ context
> + * and therefore could not run their irq_work.
> + */
> + printk_flush();

IMHO, nmi_trigger_cpumask_backtrace() might be called also from NMI
context.

printk_flush() is not safe in NMI context because console drivers have
their own locks. Even conosle_trylock() takes a lock, see
raw_spin_lock_irqsave(&sem->lock, flags) in down_trylock().

Best Regards,
Petr

> +
> clear_bit_unlock(0, &backtrace_flag);
> put_cpu();
> }