Re: Boot stall regression from "printk for 5.19" merge
From: Petr Mladek
Date: Mon Jun 20 2022 - 11:00:23 EST
On Mon 2022-06-20 08:48:29, Linus Torvalds wrote:
> On Mon, Jun 20, 2022 at 6:44 AM Petr Mladek <pmladek@xxxxxxxx> wrote:
> >
> > Both early console and proper console driver has its own kthread.
> >
> > > 1.166486] f0512000.serial: ttyS0 at MMIO 0xf0512000 (irq = 22, base_baud = 12500000) is a 16550A
> >
> > The line is malformed. I wonder if both early console and proper
> > console used the same port in parallel.
>
> Honestly, I get the feeling that we need to just revert the whole
> "console from thread" thing.
>
> Because:
>
> > So, it looks like that con->write() code is not correctly serialized
> > between the early and normal console.
> > [ ... ]
> > I am going to check the driver...
>
> We really cannot be in the situation that some random driver that used
> to work no longer does, and causes oopses and/or memory corruption
> just because it's now entered differently from how it traditionally
> has been.
>
> The traditional console write code has always been very careful to get
> exclusive access, and it sounds like that is just plain broken now.
>
> So I don't think this is a "driver is broken".
I see what you think. There might be so many problems with the drivers
because they were never used this way. It looks like we opened a can
of worms. It is even more problematic because it causes silent boot
crashes and it is hard to debug.
I kind of agree with this and I have started looking at some more
generic solution.
All these boot crashes were in exactly the same situation when the
proper console was initialized and registered while there was
the early console used at the same time. It is a problem because
they use the same port.
The parallel use of different consoles should be much more
safe because they are much more independent.
There are the following possibilities:
1. Enable the kthreads later after the early consoles are gone.
This is easy and should fix all known boot problems.
2. Temporary stop the kthreads and use direct printing when
the proper consoles are registered. Well, this might be
more complicated because the port might be accessed
also before register_console() is called.
3. Another solution would be to use the global conosle_lock()
also to synchronize the kthreads against each other. But
it would be unfortunate.
I am going to prepare 1st solution.
Best Regards,
Petr