Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all()

From: Petr Mladek
Date: Thu Apr 11 2024 - 10:15:48 EST


On Wed 2024-04-03 00:17:19, John Ogness wrote:
> Allow nbcon consoles to print messages in the legacy printk()
> caller context (printing via unlock) by integrating them into
> console_flush_all(). The write_atomic() callback is used for
> printing.

Hmm, this patch tries to flush nbcon console even in context
with NBCON_PRIO_NORMAL. Do we really want this, please?

I would expect that it would do so only when the kthread
is not working.

> Provide nbcon_legacy_emit_next_record(), which acts as the
> nbcon variant of console_emit_next_record(). Call this variant
> within console_flush_all() for nbcon consoles. Since nbcon
> consoles use their own @nbcon_seq variable to track the next
> record to print, this also must be appropriately handled.

I have been a bit confused by all the boolean return values
and what _exactly_ they mean. IMHO, we should make it more
clear how it works when it can't acquire the context.

IMHO, it is is importnat because console_flush_all() interprets
nbcon_legacy_emit_next_record() return value as @progress even when
there is no guaranteed progress. We just expect that
the other context is doing something.

It feels like it might get stuck forewer in some situatuon.
It would be good to understand if it is OK or not.


Later update:

Hmm, console_flush_all() is called from console_unlock().
It might be called in atomic context. But the current
owner might be theoretically scheduled out.

This is from documentation of nbcon_context_try_acquire()

/**
* nbcon_context_try_acquire - Try to acquire nbcon console
* @ctxt: The context of the caller
*
* Context: Any context which could not be migrated to another CPU.


I can't find any situation where nbcon_context_try_acquire() is
currently called in normal (schedulable) context. This is probably
why you did not see any problems with testing.

I see 3 possible solutions:

1. Enforce that nbcon context can be acquired only with preemtion
disabled.

2. Enforce that nbcon context can be acquired only with
interrupts. It would prevent deadlock when some future
code interrupt flush in NBCON_PRIO_EMERGENCY context.
And then a potential nested console_flush_all() won't be
able to takeover the interrupted NBCON_PRIO_CONTEXT
and there will be no progress.

3. console_flush_all() should ignore nbcon console when
it is not able to get the context, aka no progress.


I personally prefer the 3rd solution because I have spent
last 12 years on attempts to move printk into preemtible
context. And it looks wrong to move into atomic context.

Warning: console_flush_all() suddenly won't guarantee flushing
all messages.

I am not completely sure about all the consequences until
I see the rest of the patchset and the kthread intergration.
We will somehow need to guarantee that all messages
are flushed.


I propose the following changes to make console_flush_all()
safe. The patch is made on top of this patchset. Feel free
to squash them into your patch and remove my SoB.