Re: [PATCH printk v4 15/27] printk: nbcon: Provide function to flush using write_atomic()

From: Petr Mladek
Date: Wed Apr 10 2024 - 10:58:10 EST


On Wed 2024-04-03 00:17:17, John Ogness wrote:
> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>
> Provide nbcon_atomic_flush_pending() to perform flushing of all
> registered nbcon consoles using their write_atomic() callback.
>
> Unlike console_flush_all(), nbcon_atomic_flush_pending() will
> only flush up through the newest record at the time of the
> call. This prevents a CPU from printing unbounded when other
> CPUs are adding records.
>
> Also unlike console_flush_all(), nbcon_atomic_flush_pending()
> will fully flush one console before flushing the next. This
> helps to guarantee that a block of pending records (such as
> a stack trace in an emergency situation) can be printed
> atomically at once before releasing console ownership.
>
> nbcon_atomic_flush_pending() is safe in any context because it
> uses write_atomic() and acquires with unsafe_takeover disabled.
>
> Use it in console_flush_on_panic() before flushing legacy
> consoles. The legacy write() callbacks are not fully safe when
> oops_in_progress is set.
>
> Co-developed-by: John Ogness <john.ogness@xxxxxxxxxxxxx>
> Signed-off-by: John Ogness <john.ogness@xxxxxxxxxxxxx>
> Signed-off-by: Thomas Gleixner (Intel) <tglx@xxxxxxxxxxxxx>

Reviewed-by: Petr Mladek <pmladek@xxxxxxxx>

See few nits below.

> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -937,6 +935,108 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
> return nbcon_context_exit_unsafe(ctxt);
> }
>
> +/**
> + * __nbcon_atomic_flush_pending_con - Flush specified nbcon console using its
> + * write_atomic() callback
> + * @con: The nbcon console to flush
> + * @stop_seq: Flush up until this record
> + *
> + * Return: True if taken over while printing. Otherwise false.
> + *
> + * If flushing up to @stop_seq was not successful, it only makes sense for the
> + * caller to try again when true was returned. When false is returned, either
> + * there are no more records available to read or this context is not allowed
> + * to acquire the console.
> + */
> +static bool __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
> +{
> + struct nbcon_write_context wctxt = { };
> + struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
> +
> + ctxt->console = con;
> + ctxt->spinwait_max_us = 2000;
> + ctxt->prio = NBCON_PRIO_NORMAL;

Nit: It looks strange to harcode NBCON_PRIO_NORMAL and call it from
console_flush_on_panic() in the same patch.

I see. It will get replaced by nbcon_get_default_prio() later.
I guess that it is just a relic from several reworks and
shufling. I know that it is hard.

It might have been better to either add the call in
console_flush_in_panic() later. Or introduce nbcon_get_default_prio()
earlier so that we could use it here.


> +
> + if (!nbcon_context_try_acquire(ctxt))
> + return false;
> +
> + while (nbcon_seq_read(con) < stop_seq) {
> + /*
> + * nbcon_emit_next_record() returns false when the console was
> + * handed over or taken over. In both cases the context is no
> + * longer valid.
> + */
> + if (!nbcon_emit_next_record(&wctxt))
> + return true;
> +
> + if (!ctxt->backlog)
> + break;
> + }
> +
> + nbcon_context_release(ctxt);
> +
> + return false;
> +}
> +
> +/**
> + * __nbcon_atomic_flush_pending - Flush all nbcon consoles using their
> + * write_atomic() callback
> + * @stop_seq: Flush up until this record
> + */
> +static void __nbcon_atomic_flush_pending(u64 stop_seq)
> +{
> + struct console *con;
> + bool should_retry;
> + int cookie;
> +
> + do {
> + should_retry = false;
> +
> + cookie = console_srcu_read_lock();
> + for_each_console_srcu(con) {
> + short flags = console_srcu_read_flags(con);
> + unsigned long irq_flags;
> +
> + if (!(flags & CON_NBCON))
> + continue;
> +
> + if (!console_is_usable(con, flags))
> + continue;
> +
> + if (nbcon_seq_read(con) >= stop_seq)
> + continue;
> +
> + /*
> + * Atomic flushing does not use console driver
> + * synchronization (i.e. it does not hold the port
> + * lock for uart consoles). Therefore IRQs must be
> + * disabled to avoid being interrupted and then
> + * calling into a driver that will deadlock trying
> + * to acquire console ownership.
> + */
> + local_irq_save(irq_flags);
> +
> + should_retry |= __nbcon_atomic_flush_pending_con(con, stop_seq);

Nit: I have to say that this is quite cryptic. The "true" return value
usually means success. But it sets "should_retry" here.

It would mean sligtly more code but it would be much more clear
when __nbcon_atomic_flush_pending_con() returns:

+ 0 on success
+ -EAGAIN when lost the owenership
+ -EPERM when can't get the owner ship like nbcon_context_try_acquire_direct()

and we make the decision here.

> +
> + local_irq_restore(irq_flags);
> + }
> + console_srcu_read_unlock(cookie);
> + } while (should_retry);
> +}

Neither of the nits is a blocker. They are basically just about potential
complications for the future code archaeologists.

Best Regards,
Petr