Re: [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup

From: John Ogness
Date: Thu Mar 19 2020 - 04:20:52 EST


On 2020-03-18, Eugeniu Rosca <erosca@xxxxxxxxxxxxxx> wrote:
> On Tue, Mar 17, 2020 at 11:18:18AM +0900, Sergey Senozhatsky wrote:
>> On (20/03/15 18:09), Eugeniu Rosca wrote:
>>> @@ -428,6 +428,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>>> }
>>> }
>>>
>>> + console_verbose_start();
>>> +
>>> pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
>>> smp_processor_id(), duration,
>>> current->comm, task_pid_nr(current));
>>> @@ -453,6 +455,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>>> if (softlockup_panic)
>>> panic("softlockup: hung tasks");
>>> __this_cpu_write(soft_watchdog_warn, true);
>>> +
>>> + console_verbose_end();
>>> } else
>>> __this_cpu_write(soft_watchdog_warn, false);
>>
>> I'm afraid, as of now, this approach is not going to work the way it's
>> supposed to work in 100% of cases. Because the only thing that printk()
>> call sort of guarantees is that the message will be stored somewhere.
>> Either in the main kernel log buffer, on in one of auxiliary per-CPU
>> log buffers. It does not guarantee, generally speaking, that the message
>> will be printed on the console immediately.
>
> I take this passage as an acknowledgement of the problem being _real_,
> in spite of the fix being not perfect.
>
> One aspect I would like to emphasize is that (please, NAK this
> statement if it's not accurate) the problem reported in this patch is
> not specific to the existing printk mechanism, but also applies to the
> upcoming kthread-based printk. If that's true, then IMHO this is a
> compelling argument to join forces and try to find a working, safe and
> future-proof solution.

Let me clarify that the upcoming rework is _not_ simply to make a
kthread-based printk. There upcoming rework addresses several issues
(kthreads being only a piece):

1. allow printk-callers to get their messages immediately and locklessly
into the log buffer from any context

2. during normal operation, printk-callers are completely decoupled from
console printing

3. in the case of an emergency, every printk-caller will directly and
synchronously perform console printing of its messages on consoles
that support atomic writing

For the rework we decided that triggering an oops is what irreversibly
transitions the system from "normal operation" to "emergency".

One could interpret the current "console_verbose()" to be some sort of
equivalent to irreversibly transitioning the system to "emergency". But
that is not how it was decided to be interpreted. For the rework,
printk-callers do not have any influence on forcing immediate console
printing (unless they trigger an oops). console_verbose() is still
relevant in the rework. console_verbose() is affecting _what_ is printed
to consoles and _not when_.

Once the printk rework is complete, the mechanisms for atomic and
synchronous console printing will be in place. It would be possible that
these mechanisms could also be used in non-oops scenarios. But that
would need to be discussed in depth and clearly defined with caution for
abuse.

John Ogness