Re: [PATCH v2] x86/xen/64: Rearrange the SYSCALL entries
From: Andrew Cooper
Date: Mon Aug 14 2017 - 02:45:35 EST
On 14/08/2017 06:53, Andy Lutomirski wrote:
> On Sun, Aug 13, 2017 at 7:44 PM, Brian Gerst <brgerst@xxxxxxxxx> wrote:
>> On Mon, Aug 7, 2017 at 11:59 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>>> /* Normal 64-bit system call target */
>>> ENTRY(xen_syscall_target)
>>> - undo_xen_syscall
>>> - jmp entry_SYSCALL_64_after_swapgs
>>> + popq %rcx
>>> + popq %r11
>>> + jmp entry_SYSCALL_64_after_hwframe
>>> ENDPROC(xen_syscall_target)
>>>
>>> #ifdef CONFIG_IA32_EMULATION
>>>
>>> /* 32-bit compat syscall target */
>>> ENTRY(xen_syscall32_target)
>>> - undo_xen_syscall
>>> - jmp entry_SYSCALL_compat
>>> + popq %rcx
>>> + popq %r11
>>> + jmp entry_SYSCALL_compat_after_hwframe
>>> ENDPROC(xen_syscall32_target)
>>>
>>> /* 32-bit compat sysenter target */
>>> ENTRY(xen_sysenter_target)
>>> - undo_xen_syscall
>>> + mov 0*8(%rsp), %rcx
>>> + mov 1*8(%rsp), %r11
>>> + mov 5*8(%rsp), %rsp
>>> jmp entry_SYSENTER_compat
>>> ENDPROC(xen_sysenter_target)
>> This patch causes the iopl_32 and ioperm_32 self-tests to fail on a
>> 64-bit PV kernel. The 64-bit versions pass. It gets a seg fault after
>> "parent: write to 0x80 (should fail)", and the fault isn't caught by
>> the signal handler. It just dumps back to the shell. The tests pass
>> after reverting this.
> I can reproduce it if I emulate an AMD machine. I can "fix" it like this:
>
> diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
> index a8a4f4c460a6..6255e00f425e 100644
> --- a/arch/x86/xen/xen-asm_64.S
> +++ b/arch/x86/xen/xen-asm_64.S
> @@ -97,6 +97,9 @@ ENDPROC(xen_syscall_target)
> ENTRY(xen_syscall32_target)
> popq %rcx
> popq %r11
> + movq $__USER32_DS, 4*8(%rsp)
> + movq $__USER32_CS, 1*8(%rsp)
> + movq %r11, 2*8(%rsp)
> jmp entry_SYSCALL_compat_after_hwframe
> ENDPROC(xen_syscall32_target)
>
> but I haven't tried to diagnose precisely what's going on.
>
> Xen seems to be putting the 0xe0?? values in ss and cs, which oughtn't
> to be a problem, but it kills opportunistic sysretl. Maybe that's
> triggering a preexisting bug?
The reason %rcx/%r11 are extra on the stack is because Xen uses sysret
to get here. This is part of the 64bit PV ABI.
%cs will be 0xe033 (FLAT_CS64) and %ss will be 0xe02b (FLAT_SS64).
I would expect it to kill opportunistic sysret, as Xen's and Linux's
idea of using sysret differ.
~Andrew