Re: [PATCH] Introduce the pkill_on_warn boot parameter

From: Petr Mladek
Date: Fri Oct 01 2021 - 08:10:02 EST

Next message: tip-bot2 for Song Liu: "[tip: perf/urgent] perf/core: fix userpage->time_enabled of inactive events"
Previous message: Sebastian Andrzej Siewior: "Re: [PATCH 4/5] irq_work: Handle some irq_work in SOFTIRQ on PREEMPT_RT"
Next in thread: Petr Mladek: "Re: [PATCH] Introduce the pkill_on_warn boot parameter"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu 2021-09-30 12:59:03, Steven Rostedt wrote:
> On Thu, 30 Sep 2021 11:15:41 +0200
> Petr Mladek <pmladek@xxxxxxxx> wrote:
>
> > On Wed 2021-09-29 12:49:24, Paul E. McKenney wrote:
> > > On Wed, Sep 29, 2021 at 10:01:33PM +0300, Alexander Popov wrote:
> > > > On 29.09.2021 21:58, Alexander Popov wrote:
> > > > > Currently, the Linux kernel provides two types of reaction to kernel
> > > > > warnings:
> > > > > 1. Do nothing (by default),
> > > > > 2. Call panic() if panic_on_warn is set. That's a very strong reaction,
> > > > > so panic_on_warn is usually disabled on production systems.
> >
> > Honestly, I am not sure if panic_on_warn() or the new pkill_on_warn()
> > work as expected. I wonder who uses it in practice and what is
> > the experience.
>
> Several people use it, as I see reports all the time when someone can
> trigger a warn on from user space, and it's listed as a DOS of the
> system.

Good to know.

> > The problem is that many developers do not know about this behavior.
> > They use WARN() when they are lazy to write more useful message or when
> > they want to see all the provided details: task, registry, backtrace.
>
> WARN() Should never be used just because of laziness. If it is, then
> that's a bug. Let's not use that as an excuse to shoot down this
> proposal. WARN() should only be used to test assumptions where you do
> not believe something can happen. I use it all the time when the logic
> prevents some state, and have the WARN() enabled if that state is hit.
> Because to me, it shows something that shouldn't happen happened, and I
> need to fix the code.

I have just double checked code written or reviewed by me and it
mostly follow the rules. But it is partly just by chance. I did not
have these rather clear rules in my head.

But for example, the following older WARN() in format_decode() in
lib/vsprintf.c is questionable:

WARN_ONCE(1, "Please remove unsupported %%%c in format string\n", *fmt);

I guess that the WARN() was used to easily locate the caller. But it
is not a reason the reboot the system or kill the process, definitely.

Maybe, we could implement an alternative macro for these situations,
e.g. DEBUG() or warn().

> > Well, this might be different. Developers might learn this the hard
> > way from bug reports. But there will be bug reports only when
> > anyone really enables this behavior. They will enable it only
> > when it works the right way most of the time.
>
> The panic_on_warn() has been used for years now. I do not think this is
> an issue.

If panic_on_warn() is widely used then pkill_on_warn() is fine as well.

Best Regards,
Petr

Next message: tip-bot2 for Song Liu: "[tip: perf/urgent] perf/core: fix userpage->time_enabled of inactive events"
Previous message: Sebastian Andrzej Siewior: "Re: [PATCH 4/5] irq_work: Handle some irq_work in SOFTIRQ on PREEMPT_RT"
Next in thread: Petr Mladek: "Re: [PATCH] Introduce the pkill_on_warn boot parameter"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]