Re: guarantee forward progress: was: Re: [PATCH printk v2 11/12] printk: extend console_lock for proper kthread support
From: Petr Mladek
Date: Mon Apr 11 2022 - 06:45:41 EST
On Fri 2022-04-08 22:23:15, John Ogness wrote:
> On 2022-04-08, Petr Mladek <pmladek@xxxxxxxx> wrote:
> > I played a lot with it and it is really hard because:
> >
> > + new messages can appear anytime
> > + direct mode might get requested anytime
> > + only the direct mode knows whether all messages were flushed
> > on all consoles
>
> Yes, and this is why v1 dramatically simplified the picture by making
> kthreads not care about direct mode. In v1 the kthread logic is very
> simple: If there are messages to print, try to print them no matter
> what. We didn't need to worry if someone was printing, because we knew
> that at least the kthread was always printing.
>
> This meant that there would be times when direct mode is active but the
> kthreads are doing the printing. But in my experimenting, that tends to
> be the case anyway, even with this more complex v2 approach. The reason
> is that if some code does:
>
> printk_prefer_direct_enter();
> (100 lines of printk calls)
> printk_prefer_direct_exit();
>
> And directly before that printk_prefer_direct_enter() _any_ kthread was
> already inside call_console_driver(), then _all_ the console_trylock()
> calls of the above 100 printk's will fail. Inserting messages into the
> ringbuffer is fast and any active printer will not have finished
> printing its message before the above code snippet is done.
Good to know.
> In fact, the above snippet will only do direct printing if there were
> previously no unflushed messages. That is true for v1 (by design) and v2
> (by misfortune, because ringbuffer insertion is much faster than a
> single call_console_driver() call).
Yup.
> This new idea (v2) of trying to stop kthreads in order to "step aside"
> for direct printing is really just adding a lot of complexity, a lot of
> irqwork calls, and a lot of races. And with my experimenting I am not
> seeing any gain, except for new risks of nobody printing.
>
> I understand that when we say printk_prefer_direct_enter() that we
> _really_ want to do direct printing. But we cannot force it if any
> printer is already inside call_console_driver(). In that case, direct
> printing simply will not and cannot happen.
I think that we should split it into situations when we need the
direct printk and where we prefer the direct printing.
1. The direct printing is needed in situations when the kthreads can't
work by design, e.g. early boot, panic, suspend, reboot.
This was the reason why I wanted to increase the chance that
kthreads would not block it. But as you write above, it does
not help much. The consoles are so slow that the direct mode
is not used.
It is clear that my proposal does not work. And it is not
reliable anyway. This patchset actually takes care of some
situations much better way, by calling pr_flush() in console_stop()
or suspend_console().
2. The direct printing is only preferred in some situations where
the system is in troubles, for example, stall reports.
The direct mode is preferred because we think that it will be
more reliable. It is a conservative thinking. But in fact,
the kthreads might provide better results in many situations.
These stall reports often print a lot of debugging information.
The direct mode might cause soft-lockups and stalls on its own.
The kthreads allow showing the messages faster on fast consoles.
I could imagine that we actually remove the direct mode for
these stall reports in the future.
Anyway, it is perfectly fine to print them using kthreads
when the kthreads are working.
> For v3 I recommend going back to the v1 model, where kthreads do not
> care if direct mode is preferred. I claim that v2 does yield any more
> actual direct printing than v1 did.
I agree.
> However, I would keep the v2 change that kthreads go into their
> wait_event check after every message. That at least provides earlier
> responses for kthreads to stop themselves if they are disabled.
I am not sure what you mean. They seems to be completely disabled
only when newly registered console is not able to create the kthread
or when the kthread is not able to allocate buffers.
I think that there is no hurry to stop the other kthreads in this
case.
Or do you mean panic?
> Once we have atomic consoles, things will look different. Then we
> perform true synchronous direct printing. But without them, the "prefer"
> in printk_prefer_direct_enter() is only a preference that can only be
> satisfied under ideal situations (i.e. no kthread is inside
> call_console_driver()).
OK.
Best Regards,
Petr