Re: [RFC 08/17] x86/asm/64: De-Xen-ify our NMI code

From: Juergen Gross
Date: Thu Sep 07 2017 - 05:34:51 EST


This is a multi-part message in MIME format.On 06/09/17 23:36, Andy Lutomirski wrote:
> Xen PV is fundamentally incompatible with our fancy NMI code: it
> doesn't use IST at all, and Xen entries clobber two stack slots
> below the hardware frame.
>
> Drop Xen PV support from our NMI code entirely.
>
> XXX: Juergen: could you write and test the tiny patch needed to
> make Xen PV have a xen_nmi entry that handles NMIs? I don't know
> how to test it.

You mean something like the attached one?

Seems to work at least for the "normal" case of a NMI coming in at
a random point in time.

Regarding testing: in case you have a Xen setup you can easily send
a NMI to a domain from dom0:

xl trigger <domain> nmi


Juergen

>
> Cc: Juergen Gross <jgross@xxxxxxxx>
> Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
> ---
> arch/x86/entry/entry_64.S | 41 +++++++++++++++++------------------------
> 1 file changed, 17 insertions(+), 24 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index a9e318f7cc9b..c81e05fb999e 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1171,21 +1171,12 @@ ENTRY(error_exit)
> jmp retint_user
> END(error_exit)
>
> -/* Runs on exception stack */
> +/*
> + * Runs on exception stack. Xen PV does not go through this path at all,
> + * so we can use real assembly here.
> + */
> ENTRY(nmi)
> /*
> - * Fix up the exception frame if we're on Xen.
> - * PARAVIRT_ADJUST_EXCEPTION_FRAME is guaranteed to push at most
> - * one value to the stack on native, so it may clobber the rdx
> - * scratch slot, but it won't clobber any of the important
> - * slots past it.
> - *
> - * Xen is a different story, because the Xen frame itself overlaps
> - * the "NMI executing" variable.
> - */
> - PARAVIRT_ADJUST_EXCEPTION_FRAME
> -
> - /*
> * We allow breakpoints in NMIs. If a breakpoint occurs, then
> * the iretq it performs will take us out of NMI context.
> * This means that we can have nested NMIs where the next
> @@ -1240,7 +1231,7 @@ ENTRY(nmi)
> * stacks lest we corrupt the "NMI executing" variable.
> */
>
> - SWAPGS_UNSAFE_STACK
> + swapgs
> cld
> movq %rsp, %rdx
> movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
> @@ -1402,7 +1393,7 @@ nested_nmi_out:
> popq %rdx
>
> /* We are returning to kernel mode, so this cannot result in a fault. */
> - INTERRUPT_RETURN
> + iretq
>
> first_nmi:
> /* Restore rdx. */
> @@ -1432,7 +1423,7 @@ first_nmi:
> pushfq /* RFLAGS */
> pushq $__KERNEL_CS /* CS */
> pushq $1f /* RIP */
> - INTERRUPT_RETURN /* continues at repeat_nmi below */
> + iretq /* continues at repeat_nmi below */
> 1:
> #endif
>
> @@ -1502,20 +1493,22 @@ nmi_restore:
> /*
> * Clear "NMI executing". Set DF first so that we can easily
> * distinguish the remaining code between here and IRET from
> - * the SYSCALL entry and exit paths. On a native kernel, we
> - * could just inspect RIP, but, on paravirt kernels,
> - * INTERRUPT_RETURN can translate into a jump into a
> - * hypercall page.
> + * the SYSCALL entry and exit paths.
> + *
> + * We arguably should just inspect RIP instead, but I (Andy) wrote
> + * this code when I had the misapprehension that Xen PV supported
> + * NMIs, and Xen PV would break that approach.
> */
> std
> movq $0, 5*8(%rsp) /* clear "NMI executing" */
>
> /*
> - * INTERRUPT_RETURN reads the "iret" frame and exits the NMI
> - * stack in a single instruction. We are returning to kernel
> - * mode, so this cannot result in a fault.
> + * iretq reads the "iret" frame and exits the NMI stack in a
> + * single instruction. We are returning to kernel mode, so this
> + * cannot result in a fault. Similarly, we don't need to worry
> + * about espfix64 on the way back to kernel mode.
> */
> - INTERRUPT_RETURN
> + iretq
> END(nmi)
>
> ENTRY(ignore_sysret)
>