Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all()

From: John Ogness
Date: Wed Apr 17 2024 - 19:06:14 EST


On 2024-04-11, Petr Mladek <pmladek@xxxxxxxx> wrote:
> I am trying to make a full picture when and how the nbcon consoles
> will get flushed. My current understanding and view is the following,
> starting from the easiest priority:
>
>
> 1. NBCON_PRIO_PANIC messages will be flushed by calling
> nbcon_atomic_flush_pending() directly in vprintk_emit()
>
> This will take care of any previously added messages.
>
> Non-panic CPUs are not allowed to add messages anymore
> when there is a panic in progress.
>
> [ALL OK]

OK, because the end of panic will perform unsafe takeovers if necessary.

> 2. NBCON_PRIO_EMERGENCY messages will be flushed by calling
> nbcon_atomic_flush_pending() directly in nbcon_cpu_emergency_exit().
>
> This would cover all previously added messages, including
> the ones printed by the code between
> nbcon_cpu_emergency_enter()/exit().

This is best effort. If the console is owned by another context and is
marked unsafe, nbcon_atomic_flush_pending() will do nothing.

[ PROBLEM: In this case, who will flush the emergency messages? ]

> This won't cover later added messages which might be
> a problem. Let's look at this closer. Later added
> messages with:
>
> + NBCON_PRIO_PANIC will be handled in vprintk_emit()
> as explained above [OK]
>
> + NBCON_PRIO_EMERGENCY() will be handled in the
> related nbcon_cpu_emergency_exit() as described here.
> [OK]
>
> + NBCON_PRIO_NORMAL will be handled, see below. [?]
>
> [ PROBLEM: later added NBCON_PRIO_NORMAL messages, see below. ]

Yes, this is also an issue, although the solution may be the same for
this and the above problem.

> 3. NBCON_PRIO_NORMAL messages will be flushed by:
>
> + the printk kthread when it is available
>
> + the legacy loop via
>
> + console_unlock()
> + console_flush_all()
> + console nbcon_legacy_emit_next_record() [PROBLEM]
>
> PROBLEM: console_flush_all() does not guarantee progress with
> nbcon consoles as explained above (previous mail).

Not only this. If there is no kthread available, no printing will occur
until the _next_ printk(), whenever that is.


Above we have listed 3 problems:

- emergency messages will not flush if owned by another context and
unsafe

- normal messages will not flush if owned by another context

- for the above 2 problems, if there is no kthread, nobody will flush
the messages


My question: Is this really a problem?

The main idea behind the rework is that printing is deferred. The
kthreads exist for this. If the kthreads are not available (early boot
or shutdown) or the kthreads are not reliable enough (emergency
messages), a best-safe-effort is made to print in the caller
context. Only the panic situation is designed to force output (unsafely,
if necessary). Is that not enough?

> My proposal:
>
> 1. console_flush_all() will flush nbcon consoles only
> in NBCON_PRIO_NORMAL and when the kthreads are not
> available.
>
> It will make it clear that this is the flusher in
> this situation.

This is the current PREEMPT_RT implementation.

> 2. Allow to skip nbcon consoles in console_flush_all() when
> it can't take the context (as suggested in my previous
> reply).
>
> This won't guarantee flushing NORMAL messages added
> while nbcon_cpu_emergency_exit() calls
> nbcon_atomic_flush_pending().

This was the previous version. And I agree that we need to go back to
that.

> Solve this problem by introducing[*] nbcon_atomic_flush_all()
> which would flush even newly added messages and
> call this in nbcon_cpu_emergency_exit() when the printk
> kthread does not work. It should bail out when there
> is a panic in progress.
>
> Motivation: It does not matter which "atomic" context
> flushes NORMAL/EMERGENCY messages when
> the printk kthread is not available.

I do not think that solves the problem. If the console is in an unsafe
section, nothing can be printed.

> [*] Alternatively we could modify nbcon_atomic_flush_pending()
> to flush even newly added messages when the kthread is
> not working. But it might create another mess.

This discussion is about when kthreads are not available. If this is a
concern, I wonder if maybe in this situation an irq_work should be
triggered upon release of the console.

For example, something like:

static bool flush_pending(struct console *con)
{
/* If there is a kthread, let it do the work. */
if (con->kthread)
return false;

/* Make sure a record is pending. */
if (!prb_read_valid(prb, nbcon_seq_read(con), NULL))
return false;

return true;
}

static void nbcon_context_release(struct nbcon_context *ctxt)
{
...

/* Trigger irq_work to flush if necessary. */
if (flush_pending(con))
defer_console_output();
}

John