Re: Removal of printk safe buffers delays NMI context printk

From: Nicholas Piggin
Date: Fri Nov 05 2021 - 19:48:33 EST


Excerpts from John Ogness's message of November 5, 2021 11:57 pm:
> On 2021-11-05, Nicholas Piggin <npiggin@xxxxxxxxx> wrote:
>>> What was removed from 93d102f094b was irq_work triggering on all
>>> CPUs.
>>
>> No, it was the caller executing the flush for all remote CPUs itself.
>> irq work was not involved (and irq work can't be raised in a remote
>> CPU from NMI context).
>
> Maybe I am missing something. In 93d102f094b~1 I see:
>
> watchdog_smp_panic
> printk_safe_flush
> __printk_safe_flush
> printk_safe_flush_buffer
> printk_safe_flush_line
> printk_deferred
> vprintk_deferred
> vprintk_emit (but no direct printing)
> defer_console_output
> irq_work_queue

Oh I thought you meant irq_work triggering on all CPUs (i.e., including
remote CPUs) was the key.

> AFAICT, using defer_console_output() instead of your new printk_flush()
> should cause the exact behavior as before.

It does.

>> but we do need that printk flush capability back there and for
>> nmi_backtrace.
>
> Agreed. I had not considered this necessary side-effect when I removed
> the NMI safe buffers.
>
> I am just wondering if we should fix the regression by going back to
> using irq_work (such as defer_console_output()) or if we want to
> introduce something new that introduces direct printing.

irq_work works for this situation so for a minimal fix I think it's
fine. When you do the big rework it would be okay to do it directly
if you have such a facility for other reaons.

Thanks,
Nick