Re: [PATCH] Documentation: coding-style: don't encourage WARN*()

From: Alex Elder
Date: Sun Apr 14 2024 - 16:06:47 EST


On 4/14/24 2:48 PM, Laurent Pinchart wrote:
Hi Alex,

Thank you for the patch.

On Sun, Apr 14, 2024 at 12:08:50PM -0500, Alex Elder wrote:
Several times recently Greg KH has admonished that variants of WARN()
should not be used, because when the panic_on_warn kernel option is set,
their use can lead to a panic. His reasoning was that the majority of
Linux instances (including Android and cloud systems) run with this option
enabled. And therefore a condition leading to a warning will frequently
cause an undesirable panic.

The "coding-style.rst" document says not to worry about this kernel
option. Update it to provide a more nuanced explanation.

Signed-off-by: Alex Elder <elder@xxxxxxxxxx>
---
Documentation/process/coding-style.rst | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
index 9c7cf73473943..bce43b01721cb 100644
--- a/Documentation/process/coding-style.rst
+++ b/Documentation/process/coding-style.rst
@@ -1235,17 +1235,18 @@ example. Again: WARN*() must not be used for a condition that is expected
to trigger easily, for example, by user space actions. pr_warn_once() is a
possible alternative, if you need to notify the user of a problem.
-Do not worry about panic_on_warn users
-**************************************
+The panic_on_warn kernel option
+********************************
-A few more words about panic_on_warn: Remember that ``panic_on_warn`` is an
-available kernel option, and that many users set this option. This is why
-there is a "Do not WARN lightly" writeup, above. However, the existence of
-panic_on_warn users is not a valid reason to avoid the judicious use
-WARN*(). That is because, whoever enables panic_on_warn has explicitly
-asked the kernel to crash if a WARN*() fires, and such users must be
-prepared to deal with the consequences of a system that is somewhat more
-likely to crash.
+Note that ``panic_on_warn`` is an available kernel option. If it is enabled,
+a WARN*() call whose condition holds leads to a kernel panic. Many users
+(including Android and many cloud providers) set this option, and this is
+why there is a "Do not WARN lightly" writeup, above.
+
+The existence of this option is not a valid reason to avoid the judicious
+use of warnings. There are other options: ``dev_warn*()`` and ``pr_warn*()``
+issue warnings but do **not** cause the kernel to crash. Use these if you
+want to prevent such panics.

Those options are not equivalent, they print a single message, which is
much easier to ignore. WARN() is similar to -Werror in some sense, it
pushes vendors to fix the warnings. I have used WARN() in the past to
indicate usage of long-deprecated APIs that we were getting close to
removing for instance. dev_warn() wouldn't have had the same effect.

Honestly, I feel somewhat the same way--that WARN() has a use
that differs from dev_warn(). E.g., in places where something
"won't happen" (but conceivably could if someone was developing
a future change and violated an assumption).

Nevertheless, if panic_on_warn is used in Android and cloud
scenarios as Greg says, he's right that it affects many, many
systems. Perhaps it's better to more strongly discourage the
use of that option?

I saw this "don't worry about it" message and felt it at least
ought to be toned down. The broader question of whether WARN()
should generally not be used might need some more discussion.

-Alex

Use BUILD_BUG_ON() for compile-time assertions
**********************************************