Re: [patch 2/2] x86 NMI-safe INT3 and Page Fault

From: Mathieu Desnoyers
Date: Fri Jul 16 2010 - 10:49:35 EST


* Avi Kivity (avi@xxxxxxxxxx) wrote:
> On 07/14/2010 06:49 PM, Mathieu Desnoyers wrote:
>> Implements an alternative iret with popf and return so trap and exception
>> handlers can return to the NMI handler without issuing iret. iret would cause
>> NMIs to be reenabled prematurely. x86_32 uses popf and far return. x86_64 has to
>> copy the return instruction pointer to the top of the previous stack, issue a
>> popf, loads the previous esp and issue a near return (ret).
>>
>> It allows placing dynamically patched static jumps in asm gotos, which will be
>> used for optimized tracepoints, in NMI code since returning from a breakpoint
>> would be valid. Accessing vmalloc'd memory, which allows executing module code
>> or accessing vmapped or vmalloc'd areas from NMI context, would also be valid.
>> This is very useful to tracers like LTTng.
>>
>> This patch makes all faults, traps and exception safe to be called from NMI
>> context*except* single-stepping, which requires iret to restore the TF (trap
>> flag) and jump to the return address in a single instruction. Sorry, no kprobes
>> support in NMI handlers because of this limitation. This cannot be emulated
>> with popf/lret, because lret would be single-stepped. It does not apply to
>> "immediate values" because they do not use single-stepping. This code detects if
>> the TF flag is set and uses the iret path for single-stepping, even if it
>> reactivates NMIs prematurely.
>>
>
> You need to save/restore cr2 in addition, otherwise the following hits you
>
> - page fault
> - processor writes cr2, enters fault handler
> - nmi
> - page fault
> - cr2 overwritten
>
> I guess you would usually not notice the corruption since you'd just see
> a spurious fault on the page the NMI handler touched, but if the first
> fault happened in a kvm guest, then we'd corrupt the guest's cr2.

OK, just to make sure: you mean we'd have to save/restore the cr2 register
at the beginning/end of the NMI handler execution, right ? The shouldn't we
save/restore cr3 too ?

> But the whole thing strikes me as overkill. If it's 8k per-cpu, what's
> wrong with using a per-cpu pointer to a kmalloc() area?

Well, it seems like all the kernel code calling "vmalloc_sync_all()" (which is
much more than perf) can potentially cause large latencies, which could be
squashed by allowing page faults in NMI handlers. This looks like a stronger
argument to me.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/