Re: Dealing with the NMI mess

From: Andy Lutomirski
Date: Fri Jul 24 2015 - 17:09:23 EST


On Fri, Jul 24, 2015 at 1:51 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Fri, Jul 24, 2015 at 01:22:11PM -0700, Linus Torvalds wrote:
>> On Fri, Jul 24, 2015 at 12:55 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> >
>> > I worry that we'll end up running the do_debug() handlers from effective
>> > NMI context.
>> >
>> > The NMI might have preempted locks which these handlers require etc..
>>
>> If #DB takes any locks like that, then #DB is broken.
>>
>> Pretty much by definition, a data breakpoint can happen on pretty much
>> absolutely any code. This is in no way NMI-specific as far as I can
>> tell.
>>
>> Do we really take locks in the #DB handler?
>
> do_debug()
> send_sigtrap()
> force_sig_info()
> spin_lock_irqsave()
>
> Now, I don't pretend to understand the condition before send_sigtrap(),
> so it _might_ be ok, but it sure as heck could do with a comment.

Let's try to decode it.

user_icebp is set if int $0x01 happens, except it isn't because user
code can't actually do that -- it'll cause #GP instead.

user_icebp is also set if the user has a bloody in-circuit emulator,
given the name. But who on Earth has one of those on a system new
enough to run Linux and, even if they have one, why on Earth are they
using it to send SIGTRAP.

In any event, user_icebp is only set if user_mode(regs), so it's safe
locking-wise. But please let's delete it.

Otherwise, we do send_sigtrap if we got a single-step exception from
user mode (because we suppress single-step exceptions from kernel mode
a couple lines above, but we should really BUG on those except for the
single case of SYSENTER with TF set) or if we get a breakpoint
exception that wasn't eaten by perf.

For *#&!'s sake. we should rewrite this pile of crap.

// before kprobes and notify_die
#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
if (!user_mode(regs) && regs->ip == sysenter_target) {
fix it up and return;
}

notify_die, etc.

preempt_conditional_sti(regs);
do_trap(X86_TRAP_DB, SIGTRAP, "debug", regs, error_code, NULL);
preempt_conditional_cli(regs);

except we should do something to disallow fixup_exception here. Or we
could open-code if(user_mode) send_sigtrap() else die() here.

I really don't think that we should be sending signals to userspace
due to user address watchpoints that hit in kernel mode. Or, if we do
think we should send signals for those, then, as Steven said, we
should make that explicit and use IRQ work for that.

As it stands, this is probably an exploitable DoS -- just point a
watchpoint down the stack a little bit from yourself and call raise().

--Andy

--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/