Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all()

From: Petr Mladek
Date: Fri Apr 19 2024 - 05:59:22 EST


On Thu 2024-04-18 23:51:01, John Ogness wrote:
> On 2024-04-18, Petr Mladek <pmladek@xxxxxxxx> wrote:
> >> > Solve this problem by introducing[*] nbcon_atomic_flush_all()
> >> > which would flush even newly added messages and
> >> > call this in nbcon_cpu_emergency_exit() when the printk
> >> > kthread does not work. It should bail out when there
> >> > is a panic in progress.
> >> >
> >> > Motivation: It does not matter which "atomic" context
> >> > flushes NORMAL/EMERGENCY messages when
> >> > the printk kthread is not available.
> >>
> >> I do not think that solves the problem. If the console is in an unsafe
> >> section, nothing can be printed.
> >
> > IMHO, it solves the problem.
> >
> > The idea is simple:
> >
> > "The current nbcon console owner will be responsible for flushing
> > all messages when the printk kthread does not exist."
>
> Currently this is not valid. It assumes owners are printers. That is not
> always true. The owner might be some task modifying the baud rate and
> has nothing to do with printing.

Ah, I have missed this scenario.

> So what I am proposing is that we add two new normal-flushing sites that
> are only used when the kthread is not available:
>
> 1. after exiting emergency mode: in nbcon_cpu_emergency_exit()
>
> 2. after releasing ownership for non-printing: in nbcon_driver_release()
>
> I think this will close the gap and it does not require irq_work.
>
> > Sigh, all this is so complicated. I wonder how to document
> > this so that other people do not have to discover these
> > dependencies again and again. Is it even possible?
>
> In the end we will have the following 5 scenarios (assuming my
> proposal):
>
> 1. PRIO_NORMAL non-printing activity, always under con->device() lock,
> upon release flushes[*] full backlog
>
> 2. PRIO_NORMAL printing using write_thread(), always called from task
> context and under con->device() lock, always flushes full backlog
>
> 3. PRIO_NORMAL printing using write_atomic(), called with hardware
> interrupts disabled, always flushes full backlog, (only used when the
> kthread is not available)
>
> 4. PRIO_EMERGENCY printing using write_atomic(), called with hardware
> interrupts disabled, flushes through emergency, upon exit flushes[*]
> full backlog
>
> 5. PRIO_PANIC printing using write_atomic(), called with hardware
> interrupts disabled, flushes full backlog
>
> [*] Only when the kthread is not available. And in that case #3 is the
> scenario used for the trailing full flushing by #1 and #4.
>
>
> Full flushing is attempted in all 5 scenarios. A failed attempt means
> there is a new owner, but that owner is also one of the 5 scenarios.
>
> Am I missing something?
>
> IMHO #3 is the only bizarre scenario. It exists only to cover when the
> kthread is not available.

Great summary! I like it.

Let me try another summary:

We basically repeat the same trick as was used in the legacy
console_unlock(). When the kthread is not avaialale then
the current owner is responsible for flushing everything.

The game changer is the kthread. When it is available then
it takes care of "everything" as long as the system is
working normaly.

The system is working normally until suspend, shutdown, or panic().
It might also be sick. In which case, we try to flush the doctor
report immediately (emergency when safe). But we wait for
the entire doctor report before flushing (publishing).


Or another look. We have the following rules:

1. kthread is not avaialbe => owner flushes everything

2. kthread is available:

a) Normal messages are off loaded to kthread (store + kick)

b) Emergency messages (doctor examination) are first stored and then
tried to be flushed immediately (when possible/safe), including
all previous messages (other up-to-date notes).

The emergency messages might also be split in more
parts when the report is too long, for example,
backtraces or lock depedencies of all tasks.
(Reports from more doctor specialists). In this case,
each part (report) is flushed immediately (when possible/safe),
including all previous messages (other up-to-date notes).

When the system tries to continue normaly (no dying)
then any later messages (post doctor exam notes) are
let the kthread (secretary) to flush them.

c) Panic messages are flushed immediately (when possible/safe),
including all previous messages (other up-to-date notes).

d) The final flush in panic() will be done even when not safe
(patient will try to read the diary even when feeling
dizzy and it might be a complete fiasco but it is
the very last chance).


In shorrt, go ahead with your proposal.

Best Regards,
Petr