Re: [PATCH v3 3/3] sched, x86: Check that we're on the right stack in schedule and __might_sleep

From: Andy Lutomirski
Date: Wed Nov 19 2014 - 19:46:55 EST


On Wed, Nov 19, 2014 at 4:37 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Wed, Nov 19, 2014 at 4:13 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>
>> No drugs, just imprecision. This series doesn't change NMI handling
>> at all. It only changes machine_check int3, debug, and stack_segment.
>> (Why is #SS using IST stacks anyway?)
>
> .. ok, we were talking about adding an explicit preemption count to
> nmi, and then you wanted to make that conditional, that kind of
> freaked me out.

I guess I jumped around in the conversation a bit...

>
>> So my point stands: if machine_check is going to be conditionally
>> atomic, then that condition needs to be expressed somewhere.
>
> I'd still prefer to keep that knowledge in one place, rather than
> adding *another* completely ad-hoc thing in addition to what we
> already have.
>
> Also, I really don't think it should be about the particular stack
> you're using. Sure, if a debug fault happens in user space, the fault
> handler could sleep if it runs on the regular stack, but our
> "might_sleep()" are about catching things that *could* be problematic,
> even if the sleep never happens. And so, might_sleep() _should_
> actually trigger, even if it's not using the IST stack, because *if*
> the debug exception happened in kernel space, then we should warn.
>
> So I'd actually *prefer* to have special hacks that perhaps then
> "undo" the preemption count if the code expressly tests for "did this
> happen in user space, then I know I'm safe". But then it's an
> *explicit* thing, not something that just magically works because
> nobody even thought about it, and the trap happened in user space.
>
> See the argument? I'd *rather* see code like
>
> /* Magic */
> if (user_mode(regs)) {
> .. verify that we're using the normal kernel stack
> .. enable interrupts, enable preemption
> .. this is the explicit special case and it is aware
> .. of being special
> }
>
> even if on the face of it it looks hacky. But an *explicit* hack is
> preferable to something that just "happens" to work only for the
> user-mode case.

So we'd do, in do_machine_check:

irq_enter();

do atomic stuff;

ist_stop_being_atomic(regs);
local_irq_enable();
...
local_irq_disable();
ist_start_being_atomic_again();

irq_exit();

and we'd have something like:

void ist_stop_being_atomic(struct pt_regs *regs)
{
BUG_ON(!user_mode_vm(regs));
--irq_count;
}

I'm very hesitant to use irq_enter for this, though. I think we want
just the irq_count part. Maybe ist_enter() and ist_exit()? I think
that we really don't want to go anywhere near the accounting stuff in
irq_enter from an IST handler if !user_mode_vm(regs). Doing it from
asm is somewhat less error prone, although I guess we already rely on
the IDT entries themselves being in sync with the paranoid idtentry
setting.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/