Re: [PATCH] Documentation: coding-style: don't encourage WARN*()

From: Greg KH
Date: Mon Apr 15 2024 - 01:21:50 EST


On Sun, Apr 14, 2024 at 10:48:35PM +0300, Laurent Pinchart wrote:
> Hi Alex,
>
> Thank you for the patch.
>
> On Sun, Apr 14, 2024 at 12:08:50PM -0500, Alex Elder wrote:
> > Several times recently Greg KH has admonished that variants of WARN()
> > should not be used, because when the panic_on_warn kernel option is set,
> > their use can lead to a panic. His reasoning was that the majority of
> > Linux instances (including Android and cloud systems) run with this option
> > enabled. And therefore a condition leading to a warning will frequently
> > cause an undesirable panic.
> >
> > The "coding-style.rst" document says not to worry about this kernel
> > option. Update it to provide a more nuanced explanation.
> >
> > Signed-off-by: Alex Elder <elder@xxxxxxxxxx>
> > ---
> > Documentation/process/coding-style.rst | 21 +++++++++++----------
> > 1 file changed, 11 insertions(+), 10 deletions(-)
> >
> > diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
> > index 9c7cf73473943..bce43b01721cb 100644
> > --- a/Documentation/process/coding-style.rst
> > +++ b/Documentation/process/coding-style.rst
> > @@ -1235,17 +1235,18 @@ example. Again: WARN*() must not be used for a condition that is expected
> > to trigger easily, for example, by user space actions. pr_warn_once() is a
> > possible alternative, if you need to notify the user of a problem.
> >
> > -Do not worry about panic_on_warn users
> > -**************************************
> > +The panic_on_warn kernel option
> > +********************************
> >
> > -A few more words about panic_on_warn: Remember that ``panic_on_warn`` is an
> > -available kernel option, and that many users set this option. This is why
> > -there is a "Do not WARN lightly" writeup, above. However, the existence of
> > -panic_on_warn users is not a valid reason to avoid the judicious use
> > -WARN*(). That is because, whoever enables panic_on_warn has explicitly
> > -asked the kernel to crash if a WARN*() fires, and such users must be
> > -prepared to deal with the consequences of a system that is somewhat more
> > -likely to crash.
> > +Note that ``panic_on_warn`` is an available kernel option. If it is enabled,
> > +a WARN*() call whose condition holds leads to a kernel panic. Many users
> > +(including Android and many cloud providers) set this option, and this is
> > +why there is a "Do not WARN lightly" writeup, above.
> > +
> > +The existence of this option is not a valid reason to avoid the judicious
> > +use of warnings. There are other options: ``dev_warn*()`` and ``pr_warn*()``
> > +issue warnings but do **not** cause the kernel to crash. Use these if you
> > +want to prevent such panics.
>
> Those options are not equivalent, they print a single message, which is
> much easier to ignore. WARN() is similar to -Werror in some sense, it
> pushes vendors to fix the warnings. I have used WARN() in the past to
> indicate usage of long-deprecated APIs that we were getting close to
> removing for instance. dev_warn() wouldn't have had the same effect.

If you want to reboot a box because someone called an "improper" api,
then sure, use WARN(), but that feels like a really bad idea. Just
remove the api and fix up all in-kernel users instead. Why wait?

If you want to show a traceback, then just print that out, but I've seen
that totally ignored as well, removing the api is usually the only way
to get people to actually notice, as then their builds break.

thanks,

greg k-h