Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

From: Sergey Senozhatsky
Date: Sun Jan 28 2018 - 21:29:32 EST


On (01/26/18 16:26), Petr Mladek wrote:
[..]
> First, this delays showing eventually valuable information until
> the preemption is enabled. It might never happen if the system
> is in big troubles. In each case, it might be much longer delay
> than it was before.

If the system is in "big troubles" then what makes irq_work more
possible? Local IRQs can stay disabled, just like preemption. I
guess when the troubles are really big our strategy is the same
for both wq and irq_work solutions - we keep the printk_safe buffer
and wait for panic()->flush.

> Second, it makes printk() dependent on another non-trivial subsystem.
> I mean workqueues.
[..]
> The following, a bit ugly, solution has came to my mind. We could
> think about it like extending the printk_context. It counts
> printks called in this context and does nothing when we reach
> the limit. The difference is that the context is task-specific
> instead of CPU-specific.
[..]
> +int console_recursion_count;
> +int console_recursion_limit = 100;

Hm... I'm not entirely happy with magic constants, to be honest.
Why 100? One of the printk_safe lockdep reports I saw was ~270
lines long: https://marc.info/?l=linux-kernel&m=150659041411473&w=2

`console_recursion_limit' also makes PRINTK_SAFE_LOG_BUF_SHIFT
a bit useless and hard to understand - despite its value we will
store only 100 lines.

We probably can replace `console_recursion_limit' with the following:
- in the current `console_recursion' section we let only SAFE_LOG_BUF_LEN
chars to be stored in printk-safe buffer and, once we reached the limit,
don't append any new messages until we are out of `console_recursion'
context. Which is somewhat close to wq solution, the difference is that
printk_safe can happen earlier if local IRQs are enabled. But at the
same time someone might set PRINTK_SAFE_LOG_BUF_SHIFT big enough to
still cause troubles, just because printk-deadlock errors sound scary
and important enough.

I guess I'm OK with the wq dependency after all, but I may be mistaken.
printk_safe was never about "immediately flush the buffer", it was about
"avoid deadlocks", which was extended to "flush from any context which
will let us to avoid deadlock". It just happened that it inherited
irq_work dependency from printk_nmi.

We also probably can add printk_safe flush to some sysrq handler
may be.

> PS: I answered this mail because the original discussion[1] has
> already been way too long and covered many other issues.

Yep, that's I started it as a new discussion.

-ss