Re: [PATCH printk v3 03/19] printk: nbcon: Add function for printers to reacquire ownership

From: Petr Mladek
Date: Tue Jul 30 2024 - 05:24:39 EST


On Mon 2024-07-29 10:42:04, John Ogness wrote:
> On 2024-07-26, Petr Mladek <pmladek@xxxxxxxx> wrote:
> > On Mon 2024-07-22 19:25:23, John Ogness wrote:
> >> Since ownership can be lost at any time due to handover or
> >> takeover, a printing context _must_ be prepared to back out
> >> immediately and carefully. However, there are scenarios where
> >> the printing context must reacquire ownership in order to
> >> finalize or revert hardware changes.
> >>
> >> One such example is when interrupts are disabled during
> >> printing. No other context will automagically re-enable the
> >> interrupts. For this case, the disabling context _must_
> >> reacquire nbcon ownership so that it can re-enable the
> >> interrupts.
> >
> > I am still not sure how this is going to be used. It is suspicious.
> > If the context lost the ownership than another started flushing
> > higher priority messages.
> >
> > Is it really safe to manipulate the HW at this point?
> > Won't it break the higher priority context?
>
> Why would it break anything? It spins until it normally and safely
> acquires ownership again. The commit message provides a simple example
> of why it is necessary. With a threaded printer, this situation happens
> almost every time a warning occurs.

I see. It makes sense now.

> >> --- a/kernel/printk/nbcon.c
> >> +++ b/kernel/printk/nbcon.c
> >> @@ -911,6 +948,15 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
> >> return false;
> >> }
> >>
> >> + if (!wctxt->outbuf) {
> >
> > This check works only when con->write_atomic() called
> > nbcon_reacquire_nobuf().
>
> Exactly. That is what it is for.
>
> > At least, we should clear the buffer also in nbcon_enter_unsafe() and
> > nbcon_exit_unsafe() when they realize that they do own the context.
>
> OK.
>
> > Even better would be to add a check whether we still own the context.
> > Something like:
> >
> > bool nbcon_can_proceed(struct nbcon_write_context *wctxt)
> > {
> > struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> > struct nbcon_state cur;
> >
> > nbcon_state_read(con, &cur);
> >
> > return nbcon_context_can_proceed(ctxt, &cur);
> > }
>
> nbcon_can_proceed() is meant to check ownership. And it only makes sense
> to use it within an unsafe section. Otherwise it is racy.

My idea was: "If we still own the context that we have owned it all
the time and con-write_atomic() succeeded."

The race is is not important. If we lose the ownership before updating
nbcon_seq then the line will get written again anyway.

> Once a reacquire has occurred, the driver is allowed to proceed. It just
> is not allowed to print (because its buffer is gone).

I see. My idea does not work because the driver is going to reacquire
the ownership. It means that nbcon_can_proceed() would return true
even when con->atomic_write() failed.

But it is not documented anywhere. And what if the driver has a bug
and does not call reacquire. Or what if the driver does not need
to restore anything.

IMHO, nbcon_emit_next_record() should check both:

if (use_atomic)
con->write_atomic(con, wctxt);
else
con->write_thread(con, wctxt);

/* Still owns the console? */
if (!nbcon_can_proceed(wctxt)
return false;

if (!wctxt->outbuf) {
/*
* Ownership was lost and reacquired by the driver.
* Handle it as if ownership was lost.
*/
nbcon_context_release(ctxt);
return false;
}

Best Regards,
Petr