boot console: was: Re: [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads

From: Petr Mladek
Date: Wed Apr 05 2023 - 06:48:40 EST


On Thu 2023-03-02 21:02:11, John Ogness wrote:
> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>
> Add the infrastructure to create a printer thread per console along
> with the required thread function, which is takeover/handover aware.
>
> --- a/kernel/printk/internal.h
> +++ b/kernel/printk/internal.h
> @@ -75,6 +77,55 @@ u64 cons_read_seq(struct console *con);
> void cons_nobkl_cleanup(struct console *con);
> bool cons_nobkl_init(struct console *con);
> bool cons_alloc_percpu_data(struct console *con);
> +void cons_kthread_create(struct console *con);
> +
> +/*
> + * Check if the given console is currently capable and allowed to print
> + * records. If the caller only works with certain types of consoles, the
> + * caller is responsible for checking the console type before calling
> + * this function.
> + */
> +static inline bool console_is_usable(struct console *con, short flags)
> +{
> + if (!(flags & CON_ENABLED))
> + return false;
> +
> + if ((flags & CON_SUSPENDED))
> + return false;
> +
> + /*
> + * The usability of a console varies depending on whether
> + * it is a NOBKL console or not.
> + */
> +
> + if (flags & CON_NO_BKL) {
> + if (have_boot_console)
> + return false;

I am not sure if this is the right place to discuss it.
Different patches add pieces that are part of the puzzle.

Anyway, how are the NOBKL consoles supposed to work when a boot console
is still registered, please?

I see that a later patch adds:

asmlinkage int vprintk_emit(int facility, int level,
const struct dev_printk_info *dev_info,
const char *fmt, va_list args)
{
[...]
/*
* Flush the non-BKL consoles. This only leads to direct atomic
* printing for non-BKL consoles that do not have a printer
* thread available. Otherwise the printer thread will perform
* the printing.
*/
cons_atomic_flush(&wctxt, true);
[...]
}

This patch adds cons_kthread_create(). And it refuses to create
the kthread as long as there is a boot console registered.

Finally, a later added cons_atomic_flush() ignores consoles where
console_is_usable() returns false:

void cons_atomic_flush(struct cons_write_context *printk_caller_wctxt, bool skip_unsafe)
{
[...]
for_each_console_srcu(con) {
[...]
if (!console_is_usable(con, flags))
continue;

It looks to me that NOBKL consoles will not show anything as long as
a boot console is registered.

And the boot console might never be removed when "keep_bootcon"
parameter is used.


Sigh, this looks like a non-trivial problem. I see it as a combination
of two things:

+ NOBKL consoles are independent. And this is actually one
big feature.

+ There is no 1:1 relation between boot and real console using
the same output device (serial port). I mean that
register_console() is not able to match which real console
is replacing a particular boot console.

As a result, we could not easily synchronize boot and the related real
console against each other.

I see three possible solutions:

A) Ignore this problem. People are most likely using only one boot
console. And the real console will get enabled "immediately"
after this console gets removed. So that there should not be
any gap.

The only problem is that people use more real consoles. And
also unrelated real consoles will not see anything.

I guess that people might notice that they do not see anything
on ttyX console until ttySX replaces an early serial console.
And they might consider this as a regression.


B) Allow to match boot and real consoles using the same output device
and properly synchronize them against each other.

It might mean:

+ sharing the same atomic lock (atomic_state)
+ sharing the same device (port) lock
+ avoid running both at the same time by a careful
switch during the registration of the real console

, where sharing the same port lock might theoretically be done
without 1:1 matching of the related console drivers. They
would use the same port->lock spin_lock.

This might also fix the ugly race during console registration
when we unregister the boot console too early or too late.
The switch between a boot and the related real console might be
always smooth.

The problem is that it might be pretty complicated to achieve
this.


C) Synchronize also NOBKL consoles using console_sem until
all boot consoles are removed and kthreads started.

I might actually be pretty easy. It might be enough to
move cons_flush_all() from vprintk_emit() into
console_flush_all() that is called under console_lock().

I think that we need to keep cons_flush_all() in vprintk_emit()
to emit the messages directly in EMERGENCY and PANIC modes.
But we do not need or must not call it there when there is
still a boot console. We would know that it will called later
from console_unlock() in this case.


My preferences:

+ A probably is not acceptable. It would make me feel very
uncomfortable, definitely.

+ B looks like the best solution but it might be very hard to achieve.

+ C seems to be good enough for now.

I think that C is the only realistic way to go unless there is another
reasonable solution.

Best Regards,
Petr