Re: [syzbot? printk?] no WARN_ON() messages printed before "Kernel panic - not syncing: panic_on_warn set ..."

From: Tetsuo Handa
Date: Fri May 10 2019 - 10:55:19 EST

On 2019/05/10 23:12, Petr Mladek wrote:
> On Wed 2019-05-08 19:31:06, Tetsuo Handa wrote:
>> Hello.
>> We are again getting corrupted reports where message from WARN() is missing.
>> For example, was
>> titled as "WARNING in cgroup_exit" because the
>> "WARNING: CPU: 0 PID: 7870 at kernel/cgroup/cgroup.c:6008 cgroup_exit+0x51a/0x5d0"
>> line is there but
>> was titled as "corrupted report (2)" because the
>> "WARNING: CPU: 0 PID: 10223 at kernel/cgroup/cgroup.c:6008 cgroup_exit+0x51a/0x5d0"
>> line is missing. Also, it is unlikely that there was no printk() for a few minutes.
>> Thus, I suspect something is again suppressing console output.
> It is just a wild speculation. It is highly unlikely that the
> console_loglevel was manipulated to hide messages. There are
> very few locations where console_loglevel is manipulated.

But since messages from panic() are printed, messages are suppressed in a way
panic() can restore the setting.

> Anyway, the new "panic_print" feature from Feng Tang might help here.
> It is in mm tree, see
>> Since this problem is happening in 5.1.0-next-20190507, do we want to try below one?
>> kernel/printk/printk.c | 20 ++++++++++++++++++++
>> 1 file changed, 20 insertions(+)
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index e1e8250..f0b9463 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -3338,3 +3338,23 @@ void kmsg_dump_rewind(struct kmsg_dumper *dumper)
>> EXPORT_SYMBOL_GPL(kmsg_dump_rewind);
>> #endif
>> +
>> +static int initial_loglevel;
>> +static void check_loglevel(struct timer_list *timer)
>> +{
>> + if (console_loglevel < initial_loglevel)
>> + panic("Console loglevel changed (%d->%d)!", initial_loglevel,
>> + console_loglevel);
> I am not sure why panic() is needed. I would personally start with
> pr_emerg(). Anyway, I somehow doubt that this is the reason.

Since the testcase is printed prior to execution, this panic()
tries to catch it as soon as possible if some testcase is actually
changing loglevel. As I confirmed that nobody changes console loglevel
before starting fuzzing tests, I'd like to try this patch in linux-next.

>> By the way, recently we are hitting false positives caused by "WARNING:"
>> string from not WARN() messages but plain printk() messages (e.g.
>> ) and we need to avoid emitting "WARNING:" string from plain printk() messages
>> during fuzzing testing. I guess we want to add something like
>> CONFIG_DEBUG_AID_FOR_SYZBOT to all kernels in order to mask such string...
> I wonder who catches the string "WARNING" and how the system is
> killed.

syzkaller catches it and treats as if a crash occurred.

> panic_on_warn should cause that WARN() macro calls panic(). Simple
> printk() should not cause this.

Since we can't tell whether a WARNING: string reports a problem enough to
treat as a crash, we need to mask WARNING: string like .