Distinguishing kernel bugs from invalid inputs

From: Dmitry Vyukov
Date: Thu Oct 19 2017 - 04:25:35 EST


Hello,

As you may know we are doing some automated kernel testing with
syzkaller fuzzer. For that we need to be able to distinguish kernel
bugs (something to notify kernel mailing lists about) from console
messages provoked by various invalid inputs to kernel (effectively
EINVAL coming from user-space or devices, either real or test). From
time to time we have problems with "WARNING:" messages.

Most of the time they do mean kernel bugs (just not fatal), and we
found 100+ bugs based on WARNING messages and kernel mostly follows
this meaning of WARNING. But every now and then they are used for
invalid inputs and we see some push back from developers saying that
it's fine to use WARNING for, say, bad data coming from a USB device.
Comments in include/asm-generic/bug.h are not definitive in this
regard.

So I would like kernel community to define some policy around console
output that allows automatically detecting when there is a bug in
kernel, and then document it so that we don't need to get back to this
question again and again. I think it will also be useful for
administrators and users staring at dmesg. And have obvious
implications when panic_on_warn is set (not sure if it's used by
anybody in production, though). I also heard about effective
(unintentional) local DoS caused by buggy programs provoking WARNINGs
in tight loop when serial output is actually always captured.

I don't have strong preference as to how exactly it should look like.
And to make it clear, printing messages and stacks, if necessary, on
invalid inputs if fine, it just needs to be distinguishable from
kernel bugs. We could use pr_err (not containing "WARNING"!), or there
was a mention of a new macro a-la PROBLEM(). Other options?

Thoughts?

Thanks