Re: [PATCH printk v2 2/5] printk: Add NMI safety to console_flush_on_panic() and console_unblank()
From: Petr Mladek
Date: Fri Jul 14 2023 - 05:41:36 EST
On Fri 2023-07-14 13:00:49, Sergey Senozhatsky wrote:
> On (23/07/13 16:43), Petr Mladek wrote:
> >
> > Simple removal of console_trylock() in console_flush_on_panic() would
> > cause that other CPUs might still be able to take it and race.
> > The problem is avoided by checking panic_in_progress() in console_lock()
> > and console_trylock(). They will never succeed on non-panic CPUs.
> >
>
> In theory, we also can have non-panic CPU in console_flush_all(),
> which should let panic CPU to take over the next time it checks
> abandon_console_lock_in_panic() (other_cpu_in_panic() after 5/5),
> but it may not happen immediately. I wonder if we somehow can/want
> to "wait" in console_flush_on_panic() for non-panic CPU handover?
Good point. It might actually be any console_lock() owner,
including printk() on other CPU.
I think that we might need to add some wait() as we did in the last
attempt, see the commit b87f02307d3cfbda76852 ("printk: Wait for
the global console lock when the system is going down").
Anyway, it will be more important after introducing the kthreads.
There is a non-trivial chance that they would block the lock.
They might be busy when handling a message printed right before
the panic() was called. At least, this is what I saw in the last
attempt to introduce the kthreads.
But maybe, it will be somehow hidden in the new atomic lock.
It might be passed to a printk context with a higher priority
and it uses some wait internally, see the waiting in the patch
https://lore.kernel.org/all/20230302195618.156940-7-john.ogness@xxxxxxxxxxxxx/
Anyway, this patch does not make it worse. It just ignores the
potential console_lock owner in console_flush_on_panic() another
way.
Best Regards,
Petr