Re: [GIT PULL] printk for 6.11

From: Petr Mladek
Date: Thu Jul 25 2024 - 08:52:11 EST


On Wed 2024-07-24 13:33:20, Linus Torvalds wrote:
> On Wed, 24 Jul 2024 at 05:47, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > So.. I've complained about this emergency buffering before. At the very
> > least the atomic consoles should never buffer and immediately print
> > everything. Per their definition they always work.
>
> Yeah, my personal preference would be some variation of this.
>
> And when I say "some variation of this", I do think that having a
> per-console trylock is fine, and buffering *if* the atomic console is
> already busy (presumably with an existing oops, but possibly also for
> "setup issues" - ie things like "serial line is being configured" or
> "VGACON is in the middle of a redraw or console size change
> operation".
>
> And yes, before anybody speaks up, that is kind of the approximation
> of the current console_trylock() logic. I am aware. And I'm also aware
> of how much people have hated it. And I'm not claiming it's perfect.

I am afraid that we have to live with some buffering. Otherwhise,
the speed of the system might be limited by the speed of the consoles.
This might be especially noticeable during boot when a lot of HW
gets initialized and tons of messages are flushed to a slow serial console.

After all, the trylock trick has been added already in 2001. It has been
only 3 years after adding SMP support (console_lock) to consoles in 1998.

> But I do think that the *typically* important case is "something went
> horribly wrong, and the console was *not* busy at the time", and
> that's the case where there is no excuse to not just print out ASAP.

Yup.

Just for record. The idea of "buffering in emergency" came up
in the opposite scenario:

<flood of messages>

CPU 0 CPU 1

WARN()
printk()
flush_consoles()
# handling long backlog

panic()
printk()
flush_consoles()
# successfully took over the lock
# and continued flushing the backlog


Result: CPU 0 never printed the rest of the WARN()

It looked acceptable because WARN() code was just printing messages,
was well tested and should never fail (last famous words).

Another motivation was that the consoles were handled by separate
threads. They might allow to see the entire WARN() on fast consoles
before a serial one prints the first line.

Also there are ways to see the messages without working consoles,
e.g. via crash dump, pstore, persistent memory. The buffer-first
approach might make even more sense in this case.

> But I really do think that we should never buffer "by default". And
> that's why I kind of hate that whole concept of "oops_begin starts
> buffering". It's exactly the kind of "buffer by default" mental model
> that I was really hoping we'd never have.

I agree that buffering in emeregency is more risky than in normal
situation. The idea needs more love. Let's continue a more
conservative way for now.

John is going to rework the series and remove the buffering in
emeregency. I am going to send another pull request with
just few trivial fixes for 6.11.

Best Regards,
Petr