Re: [PATCH] x86: Don't clobber top of pt_regs in nested NMI
From: Salman Qazi
Date: Mon Sep 24 2012 - 12:55:29 EST
On Mon, Sep 24, 2012 at 6:27 AM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> On Tue, Sep 18, 2012 at 06:29:35PM -0700, Salman Qazi wrote:
>> The nested NMI modifies the place (instruction, flags and stack)
>> that the first NMI will iret to. However, the copy of registers
>> modified is exactly the one that is the part of pt_regs in
>> the first NMI. This can change the behaviour of the first NMI.
>>
>> In particular, Google's arch_trigger_all_cpu_backtrace handler
>> also prints regions of memory surrounding addresses appearing in
>> registers. This results in handled exceptions, after which nested NMIs
>> start coming in. These nested NMIs change the value of registers
>> in pt_regs. This can cause the original NMI handler to produce
>> incorrect output.
>
> Hmm, interesting problem.
>
>>
>> We solve this problem by introducing an extra copy of the iret
>> registers that are exclusively a part of pt_regs and are not modified
>> elsewhere.
>
> Yuck, 4 copies of the stack frame?
>
>> The downside is that the do_nmi function can no longer
>> change the control flow, as any values it writes to these five
>> registers will be discarded.
>
> I consider this a feature.
>
>>
>> Signed-off-by: Salman Qazi <sqazi@xxxxxxxxxx>
>> ---
>> arch/x86/kernel/entry_64.S | 20 +++++++++++++++++++-
>> 1 files changed, 19 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> index 69babd8..40ddb6d 100644
>> --- a/arch/x86/kernel/entry_64.S
>> +++ b/arch/x86/kernel/entry_64.S
>> @@ -1724,6 +1724,18 @@ repeat_nmi:
>> end_repeat_nmi:
>>
>> /*
>> + * We went a running NMI handling routine to have a consistent
>> + * picture of register state. This should hold true even if
>> + * there is a nested NMI. Therefore, we let the nested NMI
>> + * play with the previous copy of these registers and leave this
>> + * new copy unmodified for do_nmi()
>> + */
>> + .rept 5
>> + pushq_cfi 4*8(%rsp)
>> + .endr
>> + CFI_DEF_CFA_OFFSET SS+8-RIP
>
> Hmm, another solution that can be done without an extra copy, is to swap
> the return stack frame with the copy stack frame. This way, the copy is
> seen by the pt_regs and will always be correct. The end would basically
> be the same as you have below, just skip the copy and return.
>
> Now this breaks the idea that anything below the sp pointer is not safe
> to use. But this is the NMI stack in the controlled part of the NMI
> Handler (no breakpoints allowed here). The NMI stack is special, which
> is why we have all this crap in the first place.
>
> It would be safe to save the copy stack below the stack pointer because
> the only thing that could possible interrupt us here is another NMI,
> which would just reset the stack pointer to the top again, and notice
> that this is a nested NMI and return after modifying the return stack.
> Which is OK.
>
> At least this way, we avoid copying the stack frame a third time (two
> was enough).
Yes. This is better. For some reason, I thought about this and
decided against it. But, I can't remember what crossed my mind. I'll
try this.
>
> -- Steve
>
>
>> +
>> + /*
>> * Everything below this point can be preempted by a nested
>> * NMI if the first NMI took an exception and reset our iret stack
>> * so that we repeat another NMI.
>> @@ -1771,7 +1783,13 @@ nmi_swapgs:
>> nmi_restore:
>> RESTORE_ALL 8
>> /* Clear the NMI executing stack variable */
>> - movq $0, 10*8(%rsp)
>> + movq $0, 15*8(%rsp)
>> +
>> + /* Pop the extra copy of iret context that was saved above
>> + * just for do_nmi()
>> + */
>> + addq $5*8, %rsp
>> +
>> jmp irq_return
>> CFI_ENDPROC
>> END(nmi)
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/