Re: [Xen-devel] [PATCH v1] xen: get rid of paravirt op adjust_exception_frame

From: Andy Lutomirski
Date: Wed Jul 26 2017 - 10:09:31 EST


On Wed, Jul 26, 2017 at 7:01 AM, Andrew Cooper
<andrew.cooper3@xxxxxxxxxx> wrote:
> On 26/07/17 14:48, Andy Lutomirski wrote:
>>
>>> /* Runs on exception stack */
>>> -ENTRY(nmi)
>>> - /*
>>> - * Fix up the exception frame if we're on Xen.
>>> - * PARAVIRT_ADJUST_EXCEPTION_FRAME is guaranteed to push at most
>>> - * one value to the stack on native, so it may clobber the rdx
>>> - * scratch slot, but it won't clobber any of the important
>>> - * slots past it.
>>> - *
>>> - * Xen is a different story, because the Xen frame itself overlaps
>>> - * the "NMI executing" variable.
>>> - */
>> I would keep this comment. The Xen frame really is in the way AFAICT.
>
> (For reasons best explained by the original authors) there is only ever
> a single stack which a PV guest registers with Xen, which functions
> equivalently to tss.sp0. There is no support for stack switching via
> task switch or IST.
>
> Therefore, nested NMIs won't clobber the top of this stack.
>

Does that mean that nested NMIs on Xen just nest normally without
clobbering each other? If so, that's neat, although I wonder how we
don't get crashes due to this:

/* Normal 64-bit system call target */
ENTRY(xen_syscall_target)
undo_xen_syscall
<-- NMI right here
jmp entry_SYSCALL_64_after_swapgs
ENDPROC(xen_syscall_target)

I think it would be nice if Xen could instead enter the native syscall
path a bit later like this:

ENTRY(entry_SYSCALL_64)
/*
* Interrupts are off on entry.
* We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
* it is too small to ever cause noticeable irq latency.
*/
SWAPGS_UNSAFE_STACK
/*
* A hypervisor implementation might want to use a label
* after the swapgs, so that it can do the swapgs
* for the guest and jump here on syscall.
*/
GLOBAL(entry_SYSCALL_64_after_swapgs)

movq %rsp, PER_CPU_VAR(rsp_scratch)
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp

TRACE_IRQS_OFF

/* Construct struct pt_regs on stack */
pushq $__USER_DS /* pt_regs->ss */
pushq PER_CPU_VAR(rsp_scratch) /* pt_regs->sp */
pushq %r11 /* pt_regs->flags */
pushq $__USER_CS /* pt_regs->cs */
pushq %rcx /* pt_regs->ip */

<-- Xen enters here

then we wouldn't have all this funny business. And Xen could
completely skip the nmi nesting code.