Re: [PATCH printk v4 09/27] printk: nbcon: Implement processing in port->lock wrapper
From: Petr Mladek
Date: Wed Apr 10 2024 - 08:35:34 EST
On Wed 2024-04-03 00:17:11, John Ogness wrote:
> Currently the port->lock wrappers uart_port_lock(),
> uart_port_unlock() (and their variants) only lock/unlock
> the spin_lock.
>
> If the port is an nbcon console, the wrappers must also
> acquire/release the console and mark the region as unsafe. This
> allows general port->lock synchronization to be synchronized
> with the nbcon console ownership.
>
> Introduce a new struct nbcon_drvdata within struct console that
> provides the necessary components for the port lock wrappers to
> acquire the nbcon console and track its ownership.
>
> Also introduce uart_port_set_cons() as a wrapper to set @cons
> of a uart_port. The wrapper sets @cons under the port lock in
> order to prevent @cons from disappearing while another context
> owns the port lock via the port lock wrappers.
>
> Also cleanup the description of the console_srcu_read_flags()
> function. It is used by the port lock wrappers to ensure a
> console cannot be fully unregistered while another context
> owns the port lock via the port lock wrappers.
>
> Signed-off-by: John Ogness <john.ogness@xxxxxxxxxxxxx>
> --- a/drivers/tty/serial/8250/8250_core.c
> +++ b/drivers/tty/serial/8250/8250_core.c
> @@ -689,7 +689,7 @@ static int univ8250_console_match(struct console *co, char *name, int idx,
> continue;
>
> co->index = i;
> - port->cons = co;
> + uart_port_set_cons(port, co);
> return serial8250_console_setup(port, options, true);
I just noticed that this is a newcon->match() callback. It does:
+ univ8250_console_match()
+ serial8250_console_setup(port, options, true) // @probe == true
+ probe_baud(port)
which manipulates the serial port.
We should call also the con->match() callback under console_lock()
in try_enable_preferred_console() like we do with con->setup,
see the commit 801410b26a0e8 ("serial: Lock console when calling into
driver before registration").
Maybe, we should just take console_lock() in register_console()
around these try_enable_*() calls.
Well, this is for a separated patch or separate patchset.
> }
>
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -282,6 +282,25 @@ struct nbcon_write_context {
> bool unsafe_takeover;
> };
>
> +/**
> + * struct nbcon_drvdata - Data to allow nbcon acquire in non-print context
> + * @ctxt: The core console context
> + * @srcu_cookie: Storage for a console_srcu_lock cookie, if needed
> + * @owner_index: Storage for the owning console index, if needed
> + * @locked: Storage for the locked state, if needed
> + *
> + * All fields (except for @ctxt) are available exclusively to the driver to
> + * use as needed. They are not used by the printk subsystem.
> + */
> +struct nbcon_drvdata {
> + struct nbcon_context __private ctxt;
> +
> + /* reserved for driver use */
> + int srcu_cookie;
> + short owner_index;
> + bool locked;
> +};
> +
> /**
> * struct console - The console descriptor structure
> * @name: The name of the console driver
> --- a/include/linux/serial_core.h
> +++ b/include/linux/serial_core.h
> @@ -606,6 +609,83 @@ static inline void __uart_port_unlock_irqrestore(struct uart_port *up, unsigned
> spin_unlock_irqrestore(&up->lock, flags);
> }
>
> +/**
> + * uart_port_set_cons - Safely set the @cons field for a uart
> + * @up: The uart port to set
> + * @con: The new console to set to
> + *
> + * This function must be used to set @up->cons. It uses the port lock to
> + * synchronize with the port lock wrappers in order to ensure that the console
> + * cannot change or disappear while another context is holding the port lock.
> + */
> +static inline void uart_port_set_cons(struct uart_port *up, struct console *con)
> +{
> + unsigned long flags;
> +
> + __uart_port_lock_irqsave(up, &flags);
> + up->cons = con;
> + __uart_port_unlock_irqrestore(up, flags);
> +}
> +
> +/* Only for internal port lock wrapper usage. */
> +static inline void __uart_port_nbcon_acquire(struct uart_port *up)
> +{
> + lockdep_assert_held_once(&up->lock);
> +
> + if (likely(!uart_console(up)))
> + return;
> +
> + if (up->cons->nbcon_drvdata) {
> + /*
> + * If @up->cons is registered, prevent it from fully
> + * unregistering until this context releases the nbcon.
> + */
> + int cookie = console_srcu_read_lock();
[ later update: maybe skip 30 lines and read the "Hum, ho" part first]
[ even later update: or skip 60 lines and read "Win win" part first.]
OK, this makes sense. But I feel like we are in a lock cycle.
This code is called under "up->lock()". It will be taken also by
the newcon->device_lock() in register_console() so we would have:
+ register_console()
+ console_list_lock() // serializes SRCU access to console list.
+ con->device_lock()
+spin_lock(&up->lock)
=> console_list_lock -> up->lock
and here
+ uart_port_lock()
+ spin_lock(&up->lock)
+ __uart_port_nbcon_acquire()
+ console_srcu_read_lock() // SRCU read lock serialized by console_list_lock
=> up->lock -> SRCU read lock serialized by console_list_lock
and for completeness:
+ unregister_console()
+ console_list_lock()
+ unregister_console_locked()
+ synchronize_srcu(&console_srcu);
OK, it works because because scru_read_lock() is not blocking.
The synchronize_srcu() is called under console_list_lock(). So that
the only important thing is not taking console_list_lock() under
console_srcu_read_lock().
Hum, ho, it took me some time to create a mental model around this.
It is not that complicated after all:
+ console_list_lock(): serializes the entire (un)register console
console operation. Well, it primary serializes
the console_list manipulation, including up->cons->node
which is tested below.
+ console_lock(): serializes emitting messages on legacy and
boot consoles
+ con->device_lock aka port->lock: serializes more actions:
1. any non-printk related access to the device (HW) like
a generic read/write.
2. Access to the device by con->write() for legacy consoles.
3. console registration, in particular console_list
manipulation, including up->cons->node. It is needed
to avoid races when the non-printk code has to decide
whether it needs to get serialized against nbcon
consoles or not.
For example, it should prevent races in
__uart_port_nbcon_acquire(up) and
__uart_port_nbcon_release(up) which are added in this patch.
But wait, we take con->device_lock() only in register_console().
Is this correct?
IMHO, it is a bug. We should (must) take con->device_lock()
also in unregister_console() when manipulating the
console_list and up->cons->node. Otherwise, uart_console(up)
would provide racy results.
Win win situation:
If we take con->device_lock() in unregister_console() around
console_list manipulation then the console could never
disappear or be assigned to another port when both:
uart_console(up) && hlist_unhashed_lockless(&up->cons->node)
are "true"
=> we would not need to take console_scru_read_lock() here
=> we would not need to store/check up->line
Heh, we even would not need "bool locked" because
uart_console(up) && hlist_unhashed_lockless(&up->cons->node)
would always return the same even in __uart_port_nbcon_release()
=> easier code, straight serialization rules, no races.
> +
> + /* Ensure console is registered and is an nbcon console. */
> + if (!hlist_unhashed_lockless(&up->cons->node) &&
> + (console_srcu_read_flags(up->cons) & CON_NBCON)) {
> + WARN_ON_ONCE(up->cons->nbcon_drvdata->locked);
> +
> + nbcon_driver_acquire(up->cons);
> +
> + /*
> + * Record @up->line to be used during release because
> + * @up->cons->index can change while the port and
> + * nbcon are locked.
> + */
> + up->cons->nbcon_drvdata->owner_index = up->line;
> + up->cons->nbcon_drvdata->srcu_cookie = cookie;
> + up->cons->nbcon_drvdata->locked = true;
> + } else {
> + console_srcu_read_unlock(cookie);
> + }
> + }
> +}
> +
> +/* Only for internal port lock wrapper usage. */
> +static inline void __uart_port_nbcon_release(struct uart_port *up)
> +{
> + lockdep_assert_held_once(&up->lock);
> +
> + /*
> + * uart_console() cannot be used here because @up->cons->index might
> + * have changed. Check against @up->cons->nbcon_drvdata->owner_index
> + * instead.
> + */
> +
> + if (unlikely(up->cons &&
> + up->cons->nbcon_drvdata &&
> + up->cons->nbcon_drvdata->locked &&
> + up->cons->nbcon_drvdata->owner_index == up->line)) {
> + WARN_ON_ONCE(!up->cons->nbcon_drvdata->locked);
The WARN_ON_ONCE() would never trigger because
"up->cons->nbcon_drvdata->locked" is checked by the above
if-condition.
I hope that we would replace this by the same checks as in acquire()
part as proposed above.
> +
> + up->cons->nbcon_drvdata->locked = false;
> + nbcon_driver_release(up->cons);
> + console_srcu_read_unlock(up->cons->nbcon_drvdata->srcu_cookie);
> + }
> +}
> +
> /**
> * uart_port_lock - Lock the UART port
> * @up: Pointer to UART port structure
> @@ -654,7 +741,11 @@ static inline bool uart_port_trylock(struct uart_port *up)
> */
> static inline bool uart_port_trylock_irqsave(struct uart_port *up, unsigned long *flags)
> {
> - return spin_trylock_irqsave(&up->lock, *flags);
> + if (!spin_trylock_irqsave(&up->lock, *flags))
> + return false;
> +
> + __uart_port_nbcon_acquire(up);
I would feel more comfortable if we created
__uart_port_nbcon_try_acquire(up). It would give up when it could
not acquire the context in the given timeout.
It would by similar to acquire(). The only difference would be that
it would return false on failure. And it would call:
/**
* nbcon_driver_try_acquire - Try acquire nbcon console and enter unsafe section
* @con: The nbcon console to acquire
*
* Context: Any context which could not be migrated to another CPU.
*
* Console drivers will usually use their own internal synchronization
* mechanism to synchronize between console printing and non-printing
* activities (such as setting baud rates). However, nbcon console drivers
* supporting atomic consoles may also want to mark unsafe sections when
* performing non-printing activities.
*
* This function tries to acquires the nbcon console using priority
* NBCON_PRIO_NORMAL and marks it unsafe for handover/takeover.
*
* Return: true on success, false when it was not able to acquire the
* console and set it "usafe" for a takeover.
*/
bool nbcon_driver_try_acquire(struct console *con)
{
struct nbcon_context *ctxt = &ACCESS_PRIVATE(con->nbcon_drvdata, ctxt);
cant_migrate();
memset(ctxt, 0, sizeof(*ctxt));
ctxt->console = con;
ctxt->prio = NBCON_PRIO_NORMAL;
if (!nbcon_context_try_acquire(ctxt))
return false;
if (!nbcon_context_enter_unsafe(ctxt))
return false;
}
It is probably not that important because it should not block emitting
the emergency or panic messages. They would use NBCON_PRIO_EMERGENCY
or NBCON_PRIO_PANIC in the important code paths.
But it looks semantically wrong to use a potentially blocking function
in a try_lock() API. IMHO, it would be a call for troubles.
> + return true;
> }
>
> /**
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index 2516449f921d..38328cf0fd5c 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -988,3 +991,52 @@ void nbcon_free(struct console *con)
>
> con->pbufs = NULL;
> }
> +
> +/**
> + * nbcon_driver_acquire - Acquire nbcon console and enter unsafe section
> + * @con: The nbcon console to acquire
> + *
> + * Context: Any context which could not be migrated to another CPU.
> + *
> + * Console drivers will usually use their own internal synchronization
> + * mechasism to synchronize between console printing and non-printing
> + * activities (such as setting baud rates). However, nbcon console drivers
> + * supporting atomic consoles may also want to mark unsafe sections when
> + * performing non-printing activities.
> + *
> + * This function acquires the nbcon console using priority NBCON_PRIO_NORMAL
> + * and marks it unsafe for handover/takeover.
> + *
> + * Console drivers using this function must have provided @nbcon_drvdata in
> + * their struct console, which is used to track ownership and state
> + * information.
> + */
> +void nbcon_driver_acquire(struct console *con)
> +{
> + struct nbcon_context *ctxt = &ACCESS_PRIVATE(con->nbcon_drvdata, ctxt);
Hmm, we need to store somewhere the "struct nbcon_context" for this
generic purpose. If we agreed to remove struct nbcon_drvdata then
I would store it in struct console as
struct console {
[...]
/**
* @device_nbcon_context:
*
* nbcon_context used to serialize non-printing operations on
* the same device.
*
* The device drivers synchronize these operations with a driver-specific
* lock, such as port->lock in the serial consoles. When the
* device is registered as a console, they additionally have to acquire
* this nbcon context to get serialized against the atomic_write()
* callback using the same device.
*
* The struct does not require any special initialization.
*/
struct nbcon_context driver_nbcon_context;
[...]
};
It will be unused for legacy consoles. But the plan is convert all
console drivers anyway.
IMHO, passing it via an optional pointer is not worth the complexity.
> +
> + cant_migrate();
> +
> + do {
> + do {
> + memset(ctxt, 0, sizeof(*ctxt));
> + ctxt->console = con;
> + ctxt->prio = NBCON_PRIO_NORMAL;
> + } while (!nbcon_context_try_acquire(ctxt));
> +
> + } while (!nbcon_context_enter_unsafe(ctxt));
> +}
> +EXPORT_SYMBOL_GPL(nbcon_driver_acquire);
Best Regards,
Petr