Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all()

From: Petr Mladek
Date: Thu Apr 11 2024 - 11:46:07 EST


On Thu 2024-04-11 16:14:58, Petr Mladek wrote:
> On Wed 2024-04-03 00:17:19, John Ogness wrote:
> > Allow nbcon consoles to print messages in the legacy printk()
> > caller context (printing via unlock) by integrating them into
> > console_flush_all(). The write_atomic() callback is used for
> > printing.
>
> Hmm, this patch tries to flush nbcon console even in context
> with NBCON_PRIO_NORMAL. Do we really want this, please?
>
> I would expect that it would do so only when the kthread
> is not working.
>
> > Provide nbcon_legacy_emit_next_record(), which acts as the
> > nbcon variant of console_emit_next_record(). Call this variant
> > within console_flush_all() for nbcon consoles. Since nbcon
> > consoles use their own @nbcon_seq variable to track the next
> > record to print, this also must be appropriately handled.
>
> I have been a bit confused by all the boolean return values
> and what _exactly_ they mean. IMHO, we should make it more
> clear how it works when it can't acquire the context.
>
> IMHO, it is is importnat because console_flush_all() interprets
> nbcon_legacy_emit_next_record() return value as @progress even when
> there is no guaranteed progress. We just expect that
> the other context is doing something.
>
> It feels like it might get stuck forewer in some situatuon.
> It would be good to understand if it is OK or not.
>
>
> Later update:
>
> Hmm, console_flush_all() is called from console_unlock().
> It might be called in atomic context. But the current
> owner might be theoretically scheduled out.
>
> This is from documentation of nbcon_context_try_acquire()
>
> /**
> * nbcon_context_try_acquire - Try to acquire nbcon console
> * @ctxt: The context of the caller
> *
> * Context: Any context which could not be migrated to another CPU.
>
>
> I can't find any situation where nbcon_context_try_acquire() is
> currently called in normal (schedulable) context. This is probably
> why you did not see any problems with testing.
>
> I see 3 possible solutions:
>
> 1. Enforce that nbcon context can be acquired only with preemtion
> disabled.
>
> 2. Enforce that nbcon context can be acquired only with
> interrupts. It would prevent deadlock when some future
> code interrupt flush in NBCON_PRIO_EMERGENCY context.
> And then a potential nested console_flush_all() won't be
> able to takeover the interrupted NBCON_PRIO_CONTEXT
> and there will be no progress.
>
> 3. console_flush_all() should ignore nbcon console when
> it is not able to get the context, aka no progress.
>
>
> I personally prefer the 3rd solution because I have spent
> last 12 years on attempts to move printk into preemtible
> context. And it looks wrong to move into atomic context.
>
> Warning: console_flush_all() suddenly won't guarantee flushing
> all messages.
>
> I am not completely sure about all the consequences until
> I see the rest of the patchset and the kthread intergration.
> We will somehow need to guarantee that all messages
> are flushed.

I am trying to make a full picture when and how the nbcon consoles
will get flushed. My current understanding and view is the following,
starting from the easiest priority:


1. NBCON_PRIO_PANIC messages will be flushed by calling
nbcon_atomic_flush_pending() directly in vprintk_emit()

This will take care of any previously added messages.

Non-panic CPUs are not allowed to add messages anymore
when there is a panic in progress.

[ALL OK]


2. NBCON_PRIO_EMERGENCY messages will be flushed by calling
nbcon_atomic_flush_pending() directly in nbcon_cpu_emergency_exit().

This would cover all previously added messages, including
the ones printed by the code between
nbcon_cpu_emergency_enter()/exit().

This won't cover later added messages which might be
a problem. Let's look at this closer. Later added
messages with:

+ NBCON_PRIO_PANIC will be handled in vprintk_emit()
as explained above [OK]

+ NBCON_PRIO_EMERGENCY() will be handled in the
related nbcon_cpu_emergency_exit() as described here.
[OK]

+ NBCON_PRIO_NORMAL will be handled, see below. [?]

[ PROBLEM: later added NBCON_PRIO_NORMAL messages, see below. ]


3. NBCON_PRIO_NORMAL messages will be flushed by:

+ the printk kthread when it is available

+ the legacy loop via

+ console_unlock()
+ console_flush_all()
+ console nbcon_legacy_emit_next_record() [PROBLEM]


PROBLEM: console_flush_all() does not guarantee progress with
nbcon consoles as explained above (previous mail).


My proposal:

1. console_flush_all() will flush nbcon consoles only
in NBCON_PRIO_NORMAL and when the kthreads are not
available.

It will make it clear that this is the flusher in
this situation.


2. Allow to skip nbcon consoles in console_flush_all() when
it can't take the context (as suggested in my previous
reply).

This won't guarantee flushing NORMAL messages added
while nbcon_cpu_emergency_exit() calls
nbcon_atomic_flush_pending().

Solve this problem by introducing[*] nbcon_atomic_flush_all()
which would flush even newly added messages and
call this in nbcon_cpu_emergency_exit() when the printk
kthread does not work. It should bail out when there
is a panic in progress.

Motivation: It does not matter which "atomic" context
flushes NORMAL/EMERGENCY messages when
the printk kthread is not available.

[*] Alternatively we could modify nbcon_atomic_flush_pending()
to flush even newly added messages when the kthread is
not working. But it might create another mess.

How does it sound, please?
Or do I miss anything?

Best Regards,
Petr