Re: [PATCH printk v7 30/35] printk: Add helper for flush type logic

From: Petr Mladek
Date: Thu Aug 08 2024 - 05:03:23 EST

Next message: Sebastian Andrzej Siewior: "Re: [PATCH v3 0/3] zram: Replace bit spinlocks with a spinlock_t."
Previous message: Ilpo Järvinen: "Re: [PATCH v15 2/4] PCI/DOE: Rename Discovery Response Data Object Contents to type"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed 2024-08-07 16:17:51, John Ogness wrote:
> On 2024-08-07, Petr Mladek <pmladek@xxxxxxxx> wrote:
> > I would suggest to change the semantic and set the _preferred_
> > flush method instead of an _available_ one.
>
> I will need to evaluate this closely. I worry that the caller needs to
> understands how the helper function is choosing the preference. For
> example, at the end you make a suggestion that is broken with this
> suggested change.

I see. Well, console_flush_on_panic() is special. It is the last
resort. It actually ignores even the "what is available/allowed"
semantic.

> >> + if (ft.nbcon_atomic) {
> >> + stop_seq = prb_next_reserve_seq(prb);
> >> + goto again;
> >> + }
> >
> > BTW: I wonder how this code would look like after adding the printk
> > threads. We should do "goto again" only when ft.nbcon_atomic
> > is the preferred (only available) flush type for nbcon consoles.
>
> if (ft.nbcon_offload) {
> ...
> } else if (ft.nbcon_atomic) {
> ...
> }
>
> > IMHO, it is another reason to change the semantic.
>
> The caller does not need to rely on the helper "choosing" the right
> one. I understand your point that: It is easier for the caller when we
> can blindly rely on the helper to choose for us. But I worry that if we
> ever adjust the helper, we might break various call sites that blindly
> rely on the helper making a certain choice. If the helper's job is only
> to say what is possible, then I would worry less for the future when we
> may need to adjust the helper.

In the ideal world, the helper should tell the caller what has to be
done to flush pending messages on all consoles. The helper makes
the decision using the global variables where the variables define:

+ type of registered consoles
+ NBCON_PRIO
+ deferred context
+ possible or forced offload to kthreads

IMHO, it depends on how many callers are happy with the proposed
solution.

> >> + printk_get_console_flush_type(&ft);
> >
> > It is a nice trick to call printk_get_console_flush_type() this early.
> > I allows to hack the result when processing the hacky LOGLEVEL_SCHED ;-)
> >
> >> +
> >> /* If called from the scheduler, we can not call up(). */
> >> if (level == LOGLEVEL_SCHED) {
> >> level = LOGLEVEL_DEFAULT;
> >> do_trylock_unlock = false;
> >> - defer_legacy = true;
> >> + } else {
> >> + do_trylock_unlock = ft.legacy_direct;
> >> }
> >
> > We could hack the @ft structure directly here:
> >
> > if (level == LOGLEVEL_SCHED) {
> > level = LOGLEVEL_DEFAULT;
> > ft.legacy_offload |= ft.legacy_direct;
> > ft.legacy_direct = false;
> > }
>
> The hack seems a bit complicated to me. Especially when the helper is
> choosing preferred methods. I will think about it.

The hack converts legacy_direct -> legacy_offload.

> >> + if (!cpuhp_tasks_frozen) {
> >> + printk_get_console_flush_type(&ft);
> >> + if (ft.legacy_direct) {
> >> + if (console_trylock())
> >> + console_unlock();
> >
> > Why do we actually call only the legacy loop here?
> > IMHO, we should also do
> >
> > if (ft.nbcon_atomic)
> > nbcon_atomic_flush_pending();
>
> Atomic consoles do not care if a CPU was online or not. I can add this,
> but I expect there is nothing for the atomic console to flush.

console_cpu_notify() has been added by the commit 034260d6779087431
("printk: fix delayed messages from CPU hotplug events"). It
is related to the check

cpu_online(smp_processor_id()) == 0

which is still called in console_is_usable() even for nbcon consoles.

IMHO, it means that nbcon_atomic_flush_pending() might not be able
to flush the messages when called from vprintk_emit() on CPU which
is just being hot-plugged.

> And when
> threading is added, we would need the extra code to avoid atomic
> flushing:
>
> if (!ft.nbcon_offload && ft.nbcon_atomic)
> nbcon_atomic_flush_pending();

This extra change won't be needed when printk_get_console_flush_type(&ft)
uses the "set the preferred flush type" semantic ;-)

> >> @@ -3327,7 +3316,8 @@ void console_flush_on_panic(enum con_flush_mode mode)
> >> if (mode == CONSOLE_REPLAY_ALL)
> >> __console_rewind_all();
> >>
> >> - if (!have_boot_console)
> >> + printk_get_console_flush_type(&ft);
> >> + if (ft.nbcon_atomic)
> >> nbcon_atomic_flush_pending();
> >
> > I would use "ft.legacy_direct" also below for the decision about
> > the legacy loop:
> >
> > - if (legacy_allow_panic_sync)
> > + if (ft.legacy_direct)
> > console_flush_all(false, &next_seq, &handover);
>
> No, because it would mean the console is not flushed if the CPU is in
> the deferred state. That is why I added an extra comment in the helper
> saying that console_flush_on_panic() will _always_ flush directly.
>
> I thought about adding that extra logic into the helper, but it really
> isn't possible. @legacy_allow_panic_sync does not matter if there are no
> nbcon consoles. So somehow the helper would need to know that CPU is in
> the deferred state, but now it is allowed to do direct printing.
>
> So it seemed more straight forward to have console_flush_on_panic() not
> care about what is allowed (for legacy). It is going to flush directly
> no matter what.

Makes sense. console_flush_on_panic() is special. It is the last
resort. And it does things which are not safe/allowed.

> I will reconsider your suggestions about the helper and also compare the
> end result at the call sites (also with threading changes applied) to
> see what looks simpler to maintain.

Sigh, this is a kind of "bike shedding" discussion. It makes some
sense as long as it helps to find bugs and simplify the logic.
Feel free to stop it when you think that it is not longer worth
the effort.

Best Regards,
Petr

Next message: Sebastian Andrzej Siewior: "Re: [PATCH v3 0/3] zram: Replace bit spinlocks with a spinlock_t."
Previous message: Ilpo Järvinen: "Re: [PATCH v15 2/4] PCI/DOE: Rename Discovery Response Data Object Contents to type"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]