Re: buffer write race: Re: [PATCH printk v1 09/18] printk: nobkl: Add print state functions

From: Petr Mladek
Date: Thu Mar 30 2023 - 07:54:55 EST


On Wed 2023-03-29 16:39:54, John Ogness wrote:
> On 2023-03-29, Petr Mladek <pmladek@xxxxxxxx> wrote:
> >> +/**
> >> + * console_can_proceed - Check whether printing can proceed
> >> + * @wctxt: The write context that was handed to the write function
> >> + *
> >> + * Returns: True if the state is correct. False if a handover
> >> + * has been requested or if the console was taken
> >> + * over.
> >> + *
> >> + * Must be invoked after the record was dumped into the assigned record
> >> + * buffer
> >
> > The word "after" made me think about possible races when the record
> > buffer is being filled. The owner might loose the lock a hostile
> > way during this action. And we should prevent using the same buffer
> > when the other owner is still modifying the content.
> >
> > It should be safe when the same buffer might be used only by nested
> > contexts. It does not matter if the outer context finishes writing
> > later. The nested context should not need the buffer anymore.
> >
> > But a problem might happen when the same buffer is shared between
> > more non-nested contexts. One context might loose the lock a hostile way.
> > The other context might get the access after the hostile context
> > released the lock.
>
> Hostile takeovers _only occur during panic_.
>
> > NORMAL and PANIC contexts are safe. These priorities have only
> > one context and both have their own buffers.
> >
> > A problem might be with EMERGENCY contexts. Each CPU might have
> > its own EMERGENCY context. We might prevent this problem if
> > we do not allow to acquire the lock in EMERGENCY (and NORMAL)
> > context when panic() is running or after the first hostile
> > takeover.
>
> A hostile takeover means a CPU took ownership with PANIC priority. No
> CPU can steal ownership from the PANIC owner. Once the PANIC owner
> releases ownership, the panic message has been output to the atomic
> consoles. Do we really care what happens after that?

I see. The hostile take over is allowed only in
cons_atomic_exit(CONS_PRIO_PANIC, prev_prio) that is called at the
very end of panic() before the infinite blinking.

It is true that we do not care at this moment. It is actually called
after "suppress_printk = 1;" so that there should not be any
new messages.

Anyway, it would be nice to document this subtle race somewhere.
I could imagine that people would want to risk the hostile
takeover even earlier so the race might get introduced.

Best Regards,
Petr