Re: Removal of printk safe buffers delays NMI context printk

From: Nicholas Piggin
Date: Fri Nov 05 2021 - 07:43:31 EST


Excerpts from John Ogness's message of November 5, 2021 7:55 pm:
> On 2021-11-05, Nicholas Piggin <npiggin@xxxxxxxxx> wrote:
>>> We are planning on implementing a pr_flush() that will do something
>>> similar. But I am wondering how you are planning on triggering a CPU
>>> to call that function.
>>
>> Similar way as was removed by commit 93d102f094b ("printk: remove safe
>> buffers") fro nmi_backtrace and powerpc watchdog. The following patch
>> is what I'm using which seems to work for me.
>
> What was removed from 93d102f094b was irq_work triggering on all
> CPUs.

No, it was the caller executing the flush for all remote CPUs itself.
irq work was not involved (and irq work can't be raised in a remote
CPU from NMI context).

> Your patch is attempting to directly print from a non-NMI
> context. This is obviously a better approach if a caller is known to be
> non-NMI.
>
> Also, before 93d102f094b the safe buffer flushing was performed before
> triggering the NMI backtraces (in some cases). So clearly it worked
> before by accident rather than by design. (Not that that really
> matters.)

That's not the case, in the core nmi_backtrace.c, the flush happens
afterward.

In the powerpc code it's not actually the case either but it wasn't
obvious why at least should have been commented. powerpc's
trigger_allbutself_cpu_backtrace() is not a NMI IPI, just a normal
one. So if a CPU takes it then it will flush its own buffer via
irq_work. You only have to do the printk flush for the case of the
true NMI IPI which comes before.

That code does need a bit more work for other reasons, but we do
need that printk flush capability back there and for nmi_backtrace.

>> What would pr_flush do differently and where were you planning to use
>> it?
>
> Currently, and particularly when we move to threaded printers, printk
> callers do not know when their messages are actually visible. However,
> in some cases the printk caller (or some related task) wants to be sure
> the message is visible.
>
> pr_flush() will allow a task to block until all pending messages (at the
> time of the call) are output on all consoles.
>
> Your patch is doing something similar for non-NMI contexts, except that
> it will exit early if it fails to get the console lock. For your case
> this is probably acceptable because you are only concerned that some
> context is printing the messages, even if the latest messages may not
> have been printed yet.

Yep, if something else has the lock hopefully it should release it and
do the flush when it does. We're already 10 seconds delayed here so a
bit longer doesn't matter, just needs to get out at some point.

>> printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
>>
>> printk from NMI context relies on irq work being raised on the local CPU
>> to print to console. This can be a problem if the NMI was raised by a
>> lockup detector to print lockup stack and regs, because the CPU may not
>> enable irqs (because it is locked up).
>>
>> Introduce printk_flush() that can be called from non-NMI context on
>> another CPU to try to get those messages to the console.
>>
>> Fixes: 93d102f094be ("printk: remove safe buffers")
>> Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx>
>>
>> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
>> index 5f69ba4de1f3..59ddb24797ae 100644
>> --- a/arch/powerpc/kernel/watchdog.c
>> +++ b/arch/powerpc/kernel/watchdog.c
>> @@ -227,6 +227,12 @@ static void watchdog_smp_panic(int cpu)
>> cpumask_clear(&wd_smp_cpus_ipi);
>> }
>>
>> + /*
>> + * Force flush any remote buffers that might be stuck in IRQ context
>> + * and therefore could not run their irq_work.
>> + */
>> + printk_flush();
>> +
>
> nmi_trigger_cpumask_backtrace() tracks and waits (up to 10 seconds) for
> the CPUs to print their backtrace before flushing. Here there is no
> waiting. I suppose in practice the NMI backtrace will be fast enough,
> but I am just wondering if there should be some sort of tracking/waiting
> like in nmi_trigger_cpumask_backtrace().

Yeah it has tended to work, but I have something to fix that up.

>
>> if (hardlockup_panic)
>> nmi_panic(NULL, "Hard LOCKUP");
>>
>> diff --git a/include/linux/printk.h b/include/linux/printk.h
>> index 85b656f82d75..50424ad87fb5 100644
>> --- a/include/linux/printk.h
>> +++ b/include/linux/printk.h
>> @@ -198,6 +198,7 @@ void dump_stack_print_info(const char *log_lvl);
>> void show_regs_print_info(const char *log_lvl);
>> extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
>> extern asmlinkage void dump_stack(void) __cold;
>> +void printk_flush(void);
>> #else
>> static inline __printf(1, 0)
>> int vprintk(const char *s, va_list args)
>> @@ -274,6 +275,9 @@ static inline void dump_stack_lvl(const char *log_lvl)
>> static inline void dump_stack(void)
>> {
>> }
>> +static inline void printk_flush(void)
>> +{
>> +}
>> #endif
>>
>> #ifdef CONFIG_SMP
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index a8d0a58deebc..dd9471bb58c2 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2206,6 +2206,26 @@ int vprintk_store(int facility, int level,
>> return ret;
>> }
>>
>> +void printk_flush(void)
>> +{
>> + /*
>> + * Disable preemption to avoid being preempted while holding
>> + * console_sem which would prevent anyone from printing to
>> + * console
>> + */
>> + preempt_disable();
>> + /*
>> + * Try to acquire and then immediately release the console
>> + * semaphore. The release will print out buffers and wake up
>> + * /dev/kmsg and syslog() users.
>> + */
>> + if (console_trylock_spinning())
>> + console_unlock();
>> + preempt_enable();
>> +
>> + wake_up_klogd();
>> +}
>
> I have mixed feelings about this printk_flush() because it is exactly
> this code that is about to undergo massive reworking.

Better to fix up the regression before said massive reworking I think.

> If this function
> is accepted, then I wonder if printk_trigger_flush() might be a better
> name since it does not necessarily wait until the messages are
> printed. (There could be another task printing and already a queued
> handover printer. Or this task could handover printing before
> finishing.)

Sure I'm not wedded to the name. I can resend with the name change if
there's no other issues (the powerpc fixups will go as separate series).

Thanks,
Nick