Re: [PATCH v2] x86/xen/64: Rearrange the SYSCALL entries

From: Andy Lutomirski
Date: Mon Aug 14 2017 - 02:42:36 EST


On Sun, Aug 13, 2017 at 10:53 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> On Sun, Aug 13, 2017 at 7:44 PM, Brian Gerst <brgerst@xxxxxxxxx> wrote:
>> On Mon, Aug 7, 2017 at 11:59 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>>> /* Normal 64-bit system call target */
>>> ENTRY(xen_syscall_target)
>>> - undo_xen_syscall
>>> - jmp entry_SYSCALL_64_after_swapgs
>>> + popq %rcx
>>> + popq %r11
>>> + jmp entry_SYSCALL_64_after_hwframe
>>> ENDPROC(xen_syscall_target)
>>>
>>> #ifdef CONFIG_IA32_EMULATION
>>>
>>> /* 32-bit compat syscall target */
>>> ENTRY(xen_syscall32_target)
>>> - undo_xen_syscall
>>> - jmp entry_SYSCALL_compat
>>> + popq %rcx
>>> + popq %r11
>>> + jmp entry_SYSCALL_compat_after_hwframe
>>> ENDPROC(xen_syscall32_target)
>>>
>>> /* 32-bit compat sysenter target */
>>> ENTRY(xen_sysenter_target)
>>> - undo_xen_syscall
>>> + mov 0*8(%rsp), %rcx
>>> + mov 1*8(%rsp), %r11
>>> + mov 5*8(%rsp), %rsp
>>> jmp entry_SYSENTER_compat
>>> ENDPROC(xen_sysenter_target)
>>
>> This patch causes the iopl_32 and ioperm_32 self-tests to fail on a
>> 64-bit PV kernel. The 64-bit versions pass. It gets a seg fault after
>> "parent: write to 0x80 (should fail)", and the fault isn't caught by
>> the signal handler. It just dumps back to the shell. The tests pass
>> after reverting this.
>
> I can reproduce it if I emulate an AMD machine. I can "fix" it like this:
>
> diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
> index a8a4f4c460a6..6255e00f425e 100644
> --- a/arch/x86/xen/xen-asm_64.S
> +++ b/arch/x86/xen/xen-asm_64.S
> @@ -97,6 +97,9 @@ ENDPROC(xen_syscall_target)
> ENTRY(xen_syscall32_target)
> popq %rcx
> popq %r11
> + movq $__USER32_DS, 4*8(%rsp)
> + movq $__USER32_CS, 1*8(%rsp)
> + movq %r11, 2*8(%rsp)
> jmp entry_SYSCALL_compat_after_hwframe
> ENDPROC(xen_syscall32_target)
>
> but I haven't tried to diagnose precisely what's going on.
>
> Xen seems to be putting the 0xe0?? values in ss and cs, which oughtn't
> to be a problem, but it kills opportunistic sysretl. Maybe that's
> triggering a preexisting bug?

It is indeed triggering an existing but, but that bug is not a kernel
bug :) It's this thing:

https://sourceware.org/bugzilla/show_bug.cgi?id=21269

See, we have this old legacy garbage in which, when running with
nonstandard SS, a certain special, otherwise nonsensical input to
sigaction() causes a stack switch. Xen PV runs user code with a
nonstandard SS, and glibc accidentally passes this weird parameter to
sigaction() on a regular basis. With this patch applied, the kernel
suddenly starts to *realize* that ss is weird, and boom. (Or maybe it
increases the chance that SS is actually weird, since I'd expect this
to trip on #GP, not SYSCALL. But I don't care quite enough to dig
further.)

Patch coming.