Re: [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup

From: Sergey Senozhatsky
Date: Thu Mar 19 2020 - 02:48:42 EST


Hi,

On (20/03/18 19:05), Eugeniu Rosca wrote:
[..]
> > I'm afraid, as of now, this approach is not going to work the way it's
> > supposed to work in 100% of cases. Because the only thing that printk()
> > call sort of guarantees is that the message will be stored somewhere.
> > Either in the main kernel log buffer, on in one of auxiliary per-CPU
> > log buffers. It does not guarantee, generally speaking, that the message
> > will be printed on the console immediately.
>
> I take this passage as an acknowledgement of the problem being _real_,
> in spite of the fix being not perfect.
>
> One aspect I would like to emphasize is that (please, NAK this
> statement if it's not accurate) the problem reported in this patch is
> not specific to the existing printk mechanism, but also applies to the
> upcoming kthread-based printk. If that's true, then IMHO this is a
> compelling argument to join forces and try to find a working, safe and
> future-proof solution.

Yes, this is a "known issue".

> > Consider the following example:
> >
> > CPU0 CPU1
> > console_lock();
> > schedule();
> >
> > watchdog()
> > console_verbose_start();
> > printk()
> > log_store()
> > if (!console_trylock())
> > return;
> > console_verbose_end();
> >
> > ...
> > console_unlock()
> > print logbuf messages to the consoles
> > we missed the console_verbose_start/end
> > on CPU1
>
> This looks plausible. However, I wonder to which degree the same
> scenario is a concern in the kthread-based approach?

This is a problem for any async/deferred printk() printout. When we
print messages on the consoles we don't have any idea anymore if the
CPU which added those messages to the logbuf had any issues going on.

> My current standpoint is that as long as points [A-D] are met, it
> should do no harm to accept a (partial) fix like seen in my series:
>
> - [A] the patch tackles at least a subset of problematic use-cases
> - [B] the fix is non-intrusive and easy to review
> - [C] there is hope to reuse it in the new lockless buffer based printk
> - [D] there are no regressions employing the major console knobs
> (ignore_loglevel, quiet, loglevel, etc) as it happened in
> a6ae928c25835c ("Revert "printk: make sure to print log on console."")

So the issue is that when we bump `console_loglevel' or `ignore_loglevel'
we lift restrictions globally. For all CPUs, for all contexts which call
printk(). This may have negative impact. Fuzzing is one example. It
usually generates a lot of printk() noise, so lifting global printk()
log_level restrictions does cause problems. I recall that fuzzing people
really disliked the
old_level = console_loglevel
console_loglevel = new_level
....
console_loglevel = old_level

approach. Because if lets all CPUs and tasks to pollute serial logs.
While what we want is to print messages from a particular CPU/task.

So _maybe_ we can explore the per-context printk() flags instead. The idea
is to store a "desired"/emergency ->log_level. When a context sets that
->log_level we need to make any printk() messages issued by that context
at least of that ->log_level. IOW we overwrite messages log_levels, because
printk() is not synchronous by design.

We had similar discussion before [0].
I was misunderstood back then and in order to avoid the same
misunderstanding this time around, let me first state that - *I DO NOT*
propose to disable preemption for printk(). The code in question
was *just a POC snippet*.

To move that POC further - we can have the same per-context printk()
fields in every task_struct struct. Ideally without consuming additional
memory, or consuming as few extra bytes per task_struct as possible.
But even if we will have ->log_level embedded into every task_struct, we
still need to distinguish various cases (contexts):

- when current task wants to declare printk() emergency and sets
current->log_level, only this process is in printk() emergency.
- IRQs and NMIs which interrupt current should not be in printk()
emergency, unless IRQ and/or NMI explicitly declare printk()
emergency.

- when IRQ declares printk() emergency and sets current->log_level, only
this IRQ is in printk() emergency

- when NMI declares printk() emergency and sets current->log_level, only
this NMI is in printk() emergency

Another thing is. Suppose current sets ->log_level, and then encounters
a page_fault, which printk-s from exception handler. We shall distinguish
such cases too, I think.

So something like

printk_emergency_enter(LOG_LEVEL)
current->context[ctx] = LOG_LEVEL;
...
printk()
// overwrite message log_level to current->context[ctx] level
dump_stack()
// same log_level overwrite
...
printk_emergency_exit()
current->context[ctx] = -1

So when we iterate pending logbuf messages (CPU0->console_unlock() in
the example from my first email) messages from CPU1 will have appropriate
"desired" log_level and we will print those messages on the consoles.

Well. Something like this.


[0] https://lore.kernel.org/linuxppc-dev/20191112021747.GA68506@xxxxxxxxxx/

-ss