Re: [PATCH] printk/console: Enable console kthreads only when there is no boot console left

From: Petr Mladek
Date: Mon Jun 20 2022 - 18:25:14 EST


On Mon 2022-06-20 14:10:20, Linus Torvalds wrote:
> On Mon, Jun 20, 2022 at 10:14 AM Petr Mladek <pmladek@xxxxxxxx> wrote:
> >
> > The console kthreads uncovered several races in console drivers.
>
> I really want to make it clear that this was NOT some kind of "races
> in drivers".
>
> Console drivers may very well have intentionally avoided taking locks
> for console output, since the printk output was supposed to be
> serialized by printk.
>
> Don't try to make this some kind of "buggy drivers" thing. This is on
> printk, not on anything else.

OK, I see that uart_console_write() is used by
early_serial8250_write() without port->lock. It means that it is
racy against serial8250_console_write(). It might
cause problems reported by this thread. And you are
right that it has never been used in parallel before the kthreads.

But I believe that it might cause real problems. serial8250_console_write()
takes port->lock to get serialized against other operations on the
port. And there might be some when the same port is added as
a proper serial console.

Today I found that probe_baud() is called from
serial8250_console_setup() without port->lock. It does reads and
writes. I believe that it might break with the earlycon.

Also the commit 589f892ac8ef244e47c5a ("serial: meson:
acquire port->lock in startup()") fixes a race between
meson_serial_port_write() and meson_uart_startup(), where
meson_serial_port_write() is used by both early and proper
console driver. The problem was there even without kthreads.
They just made it more visible.

My colleagues familiar with ARM told me that they heard about
boot freezes with early consoles before threads. The kthreads
allow to reproduce and fix them. In the end, they make the early
consoles more reliable.


> Assuming this solves all issues, I'm ok with this approach, but I
> really want this to be clearly about printk being buggy, no "blame the
> drivers" garbage.
>
> And if there are other issues coming up, we revert the whole thing entirely.
>
> Because printk is too important to play games with, and too important
> to try to blame drivers.

I take printk() really seriously. And I definitely do not want
to wave out problems as others problem.

I do not want to release 5.19 with broken printk(). But the kthreads
solve real bugs where printk() put the system into knees. I want
to invest much more time on improving them and fixing related
problems. Unfortunately, linux-next was not able to catch
the recently reported problems and we were not able to fix them
in advance.

All the recent fixes were generic and should make printk() with
kthreads much more reliable. I can't be sure if it will be enough.
I could only say that I am going to fix any new ones.

Of course, if people continue reporting problems, we would need
to revert it for 5.19. But I would really like to give it another
chance later.

Best Regards,
Petr