Re: [syzbot? printk?] no WARN_ON() messages printed before "Kernel panic - not syncing: panic_on_warn set ..."

From: Petr Mladek
Date: Fri May 10 2019 - 10:13:54 EST


On Wed 2019-05-08 19:31:06, Tetsuo Handa wrote:
> Hello.
>
> We are again getting corrupted reports where message from WARN() is missing.
> For example, https://syzkaller.appspot.com/text?tag=CrashLog&x=1720cac8a00000 was
> titled as "WARNING in cgroup_exit" because the
> "WARNING: CPU: 0 PID: 7870 at kernel/cgroup/cgroup.c:6008 cgroup_exit+0x51a/0x5d0"
> line is there but https://syzkaller.appspot.com/text?tag=CrashLog&x=1670a602a00000
> was titled as "corrupted report (2)" because the
> "WARNING: CPU: 0 PID: 10223 at kernel/cgroup/cgroup.c:6008 cgroup_exit+0x51a/0x5d0"
> line is missing. Also, it is unlikely that there was no printk() for a few minutes.
> Thus, I suspect something is again suppressing console output.

It is just a wild speculation. It is highly unlikely that the
console_loglevel was manipulated to hide messages. There are
very few locations where console_loglevel is manipulated.

Anyway, the new "panic_print" feature from Feng Tang might help here.
It is in mm tree, see
https://marc.info/?l=linux-mm-commits&m=155614613719648&w=2


> Since this problem is happening in 5.1.0-next-20190507, do we want to try below one?
>
> kernel/printk/printk.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index e1e8250..f0b9463 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3338,3 +3338,23 @@ void kmsg_dump_rewind(struct kmsg_dumper *dumper)
> EXPORT_SYMBOL_GPL(kmsg_dump_rewind);
>
> #endif
> +
> +#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
> +static int initial_loglevel;
> +static void check_loglevel(struct timer_list *timer)
> +{
> + if (console_loglevel < initial_loglevel)
> + panic("Console loglevel changed (%d->%d)!", initial_loglevel,
> + console_loglevel);

I am not sure why panic() is needed. I would personally start with
pr_emerg(). Anyway, I somehow doubt that this is the reason.

> + mod_timer(timer, jiffies + HZ);
> +}
> +static int __init loglevelcheck_init(void)
> +{
> + static DEFINE_TIMER(timer, check_loglevel);
> +
> + initial_loglevel = console_loglevel;
> + mod_timer(&timer, jiffies + HZ);
> + return 0;
> +}
> +late_initcall(loglevelcheck_init);
> +#endif
>
>
>
>
>
> By the way, recently we are hitting false positives caused by "WARNING:"
> string from not WARN() messages but plain printk() messages (e.g.
>
> https://syzkaller.appspot.com/bug?id=31bdef63e48688854fde93e6edf390922b70f8a4
> https://syzkaller.appspot.com/bug?id=faae4720a75cadb8cd0dbda5c4d3542228d37340
>
> ) and we need to avoid emitting "WARNING:" string from plain printk() messages
> during fuzzing testing. I guess we want to add something like
> CONFIG_DEBUG_AID_FOR_SYZBOT to all kernels in order to mask such string...

I wonder who catches the string "WARNING" and how the system is
killed.

panic_on_warn should cause that WARN() macro calls panic(). Simple
printk() should not cause this.

Best Regards,
Petr