Re: BUG_ON() in workingset_node_shadows_dec() triggers

From: Linus Torvalds
Date: Wed Oct 05 2016 - 18:30:04 EST


On Wed, Oct 5, 2016 at 3:17 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>
> With my more paranoid desires, I would prefer to keep "stop kernel
> execution with the state set up by this process", not just "make the
> process never return to user-space".

Quite honestly, I think the answer to that is: "No. Not by default".
So with some kind of kernel command line option, yes, kind of like
"reboot_on_oops" (or whatever it is - I've never used it ;)

>> And *if* we make BUG() actually do something sane (non-trapping), we
>> can easily make it be generic, not arch-specific. In fact, I'd
>> implement it by just adding a "handle_bug()" in kernel/panic.c...
>
> Yeah, I'm not sure what the right next step would be. Do we need a new
> set of functions between WARN and BUG? Or maybe extract the
> process-killing logic on a per-arch level and make it a specific API
> so that it can be explicitly called as part of error-handling? Hmm

So the process-killing logic actually used to historically just be
"call do_exit()". In fact, that's what most architectures still do in
their error paths. And it's what a lot of people who just want to kill
the current code do.

So calling "do_exit()" is actually perfectly fine. It's just that
calling do_exit() from BUG_ON() is a major pain, because of the
asynchronous nature of BUG_ON(). But if you are in a regular system
call and don't hold any locks, do_exit() is still fine.

In fact, all that x86 really does differently from do_exit() in the
fault path is to reset the stack pointer first, so that you don't get
stack smashers when you have recursive faults (which used to be one
really nasty failure case, not just with BUG_ON(), but with any kernel
oops in general). So on x86, the crash code actually calls a function
called "rewind_stack_do_exit()" instead.

But the name gives it away: it's the exact same thing.

So you can actually do a generic BUG_ON() (even with the current
semantics) pretty much today by just having a config option that the
architecture can set to specify whether you should just call
"do_exit()" or "rewind_stack_do_exit()" to do that final killing
action.

There's a few other possible gotcha's (the code is hard to follow
because the normal implementation uses a trapping instruction and
hides the BUG() information in the text, so you get the whole fault
path), but on the whole I think it should be fairly straightforward do
just get rid of all the arch code, and replace it with a generic
function that can then decide internally whether it wants to just
warn, whether it wants to SIGKILL, or whether it wants to do the
traditional thing and just force do_exit(). Or do new things like
reboot or just halt.

But it really would be very nice to never have do_exit() have to worry
about odd callers. We've had a *lot* of trouble over the years with
deadlocks on critical locks in do_exit(), for example.

Linus