Re: [PATCH printk v4 06/27] printk: nbcon: Add callbacks to synchronize with driver

From: Petr Mladek
Date: Wed Apr 17 2024 - 09:03:58 EST


On Tue 2024-04-16 17:07:39, John Ogness wrote:
> On 2024-04-09, Petr Mladek <pmladek@xxxxxxxx> wrote:
> >> Console drivers typically must deal with access to the hardware
> >> via user input/output (such as an interactive login shell) and
> >> output of kernel messages via printk() calls.
> >>
> >> Follow-up commits require that the printk subsystem is able to
> >> synchronize with the driver. Require nbcon consoles to implement
> >> two new callbacks (device_lock(), device_unlock()) that will
> >> use whatever synchronization mechanism the driver is using for
> >> itself (for example, the port lock for uart serial consoles).
> >
> > We should explain here the bigger picture, see my comment
> > of the word "whenever" below.
> >
> > <proposal>
> > Console drivers typically have to deal with access to the hardware
> > via user input/output (such as an interactive login shell) and
> > output of kernel messages via printk() calls.
> >
> > They use some classic locking mechanism in most situations. But
> > console->write_atomic() callbacks, used by nbcon consoles,
> > must be synchronized only by acquiring the console context.
> >
> > The synchronization via the console context ownership is possible
> > only when the console driver is registered. It is when a particular
> > device driver is connected with a particular console driver.
> >
> > The two synchronization mechanisms must be synchronized between
> > each other. It is tricky because the console context ownership
> > is quite special. It might be taken over by a higher priority
> > context. Also CPU migration has to be disabled.
> >
> > The most tricky part is to (dis)connect these two mechanism during
> > the console (un)registration. Let's make it easier by adding two new
> > callbacks: device_lock(), device_unlock(). They would allow
> > to take the device-specific lock while the device is being
> > (un)registered by the related console driver.
> >
> > For example, these callbacks would lock/unlock the port lock
> > for serial port drivers.
> > </proposal>
>
> I do not like this proposal. IMHO it is focussing only on nbcon
> registration details (a corner case) rather than presenting the big
> picture.

Fair enough.

> Yes, we will use these callbacks during registration/unregistration to
> avoid some races, but that is just one usage (and not the primary
> one). The main reason these callbacks exist is to provide driver
> synchronization when printing.
>
> We want to avoid using nbcon console ownership contention whenever
> possible. In fact, there should _never_ be nbcon console owership
> contention except for in emergency or panic situations.
>
> In the normal case, printk will use the driver-specific locking for
> synchronization. Previously this was achieved by implementing the
> lock/unlock within the write() callback. But with nbcon consoles that is
> not possible because the nbcon ownership must be taken outside of the
> write callback:
>
> con->device_lock()
> nbcon_acquire()
> con->write_atomic() or con->write_thread()
> nbcon_release()
> con->device_unlock()

This sounds like a strong requirement. So there should be a strong
reason ;-) I would like to better understand it [*] and
document it a clear way.

In principle, we do not need to take the full con->device_lock()
around nbcon_acquire() in the normal context. By other words,
when nbcon_acquire() is safe enough in emergency context
then it should be safe enough in the normal context either.
Otherwise, we would have a problem.

My understanding is that we want to take con->device_lock()
in the normal context from two reasons:

1. This is historical, king of speculation, and probably
not the real reason.

In the initial RFC, nbcon_try_acquire() allowed to take over
the context with the same priority.

The con->device() taken in the kthread might have prevented
stealing the context by another "random" uart_port_lock() call from
a code which would not continue emitting the messages.

But the current nbcon_try_acquire() implementation does not take
over the context with the same priority. So, this reason
is not longet valid. But it probably has never been the reason.


2. The con->device() defines the context in which nbcon_acquire()
will be taken and con->write_atomic() called to make it
safe against other operations with the device driver.

For example, con->device() from uart serial consoles would
disable interrupts to prevent deadlocks with the serial
port IRQ handlers.

Some other drivers might just need to disable preemption.
And some (future) drivers might even allow to keep
the preemption enabled.


I believe that the 2nd reason is that right one. But it is far
from obvious from the proposed comments.


[*] I am pretty sure that we have had the same discussion during
Plumbers 2023. I am sorry I do not recall all the details now.
Anyway, this should be explained in public anyway.


> How about this for the commit message:
>
> printk: nbcon: Add callbacks to synchronize with driver
>
> Console drivers typically must deal with access to the hardware
> via user input/output (such as an interactive login shell) and
> output of kernel messages via printk() calls. To provide the
> necessary synchronization, usually some driver-specific locking
> mechanism is used (for example, the port spinlock for uart
> serial consoles).
>
> Until now, usage of this driver-specific locking has been hidden
> from the printk-subsystem and implemented within the various
> console callbacks.

So far, so good.

> However, for nbcon consoles, it is necessary
> that the printk-subsystem uses the driver-specific locking so
> that nbcon console ownership can be acquired _after_ the
> driver-specific locking has succeeded. This allows for lock
> contention to exist on the more context-friendly driver-specific
> locking rather than nbcon console ownership (for non-emergency
> and non-panic cases).

I guess that the part:

This allows for lock contention to exist on the more
context-friendly driver-specific locking

tries to explain the 2nd reason that I have described above.
But it looks too cryptic to me. I have got it only because
I knew what I was looking for.

> Require nbcon consoles to implement two new callbacks
> (device_lock(), device_unlock()) that will use whatever
> synchronization mechanism the driver is using for itself.


Honestly, I still think that we need con->device_lock() primary
to synchronize the console registration and unregistration.

In all other cases, we only need to know whether we have
to call nbcon_try_acquire() with interrupts disabled or not[**].
And we need to know this for any nbcon_try_acquire() access,
including the emergency context.

Maybe, we should pass this information another way because
we do not want to call con->device_lock() in
nbcon_cpu_emergency_exit().

[**] According to my last findings, we always need to disable
preemption, see
https://lore.kernel.org/r/Zhj5uQ-JJnlIGUXK@localhost.localdomain


I still have to shake my head around this. But I would first like
to know whether:

+ You agree that nbcon_try_acquire() always have to be called with
preemption disabled.

+ What do you think about explicitly disabling preemption
in nbcon_try_acquire().

+ If it is acceptable for the big picture. It should be fine for
serial consoles. But I think that graphics consoles wanted to
be preemptive when called in the printk kthread.


I am sure that it will be possible to make nbcon_try_acquire()
preemption-safe but it will need some more magic. Maybe, we could
do it later when really needed. The primary target are the slow
serial consoles anyway.


> P.S. I think it makes more sense to use your proposal for the commit
> message of the follow-up patch "printk: nbcon: Use driver
> synchronization while registering".

Yup, feel free to use it.

Best Regards,
Petr