Re: [PATCH] debug: Deprecate BUG_ON() use in new code, introduce CRASH_ON()

From: Ingo Molnar
Date: Mon Jun 08 2015 - 04:09:20 EST



* Alexander Holler <holler@xxxxxxxxxxxxx> wrote:

> Am 08.06.2015 um 09:12 schrieb Ingo Molnar:
> >
> >* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> >>Stop with the random BUG_ON() additions.
> >
> > Yeah, so I propose the attached patch which attempts to resist new BUG_ON()
> > additions.
>
> As this reminded me at flame I received once from a maintainer because I wanted
> to avoid a desastrous memory corruption by using a BUG_ON(). maybe someone
> should mention that a BUG_ON or now CRASH_ON should be still prefered instead of
> some random memory corruption which might lead to worse things. Or how is the
> viewpoint of the kernel masters in regard to memory corruptions and use of
> BUG_ON, WARN_ON or CRASH_ON?

So it depends on the actual change, but there's very few cases where a BUG_ON() is
justified, even if the code detects memory corruption.

Most instances of memory corruption either come from the hardware or come from
some other piece of code, so _your_ code crashing the system will be unexpected,
and in most cases unproductive to finding the cause of the corruption.

The best action is to stop doing whatever your code was doing, trying to bail out
with as little extra changes done to the system as possible.

An example for that are lockdep's asserts. An actual lockdep warning in a
released, production kernel is frequently connected to a real risk of data
corruption - yet what we do is that we report the bug non-intrusively and turn off
lockdep completely, so that it does not make the situation worse and that we have
a chance the messages can be saved and can be reported back to kernel developers.

The origins of widespread BUG_ON() use are twofold:

- 20 years ago we didn't have much of any locking in the kernel, so a BUG_ON()
resulted in essence in a graceful segfault of the application that happened to
trigger it, in most cases. Kernel logs were still possible to retrieve if the
bug did not trigger too often - and if not (because for example the crash
happened in the idle thread) then the backtrace was still visible on the VGA
text console.

- in the early days we didn't have WARN_ON(), we only had BUG_ON(), so people
used that. BUG_ON() used to be the 'graceful' assert, panic() was the
equivalent of CRASH_ON().

These days a BUG_ON() is almost always fatal due to unreleased locks, plus we
still don't print kernel crashes to the graphical console, so they are silent hard
lockups in 99% of the cases.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/