Re: [PATCH 15/43] x86/entry/64: Create a percpu SYSCALL entry trampoline

From: Thomas Gleixner
Date: Fri Nov 24 2017 - 08:52:57 EST


On Fri, 24 Nov 2017, Ingo Molnar wrote:

> From: Andy Lutomirski <luto@xxxxxxxxxx>
>
> Handling SYSCALL is tricky: the SYSCALL handler is entered with every
> single register (except FLAGS), including RSP, live. It somehow needs
> to set RSP to point to a valid stack, which means it needs to save the
> user RSP somewhere and find its own stack pointer. The canonical way
> to do this is with SWAPGS, which lets us access percpu data using the
> %gs prefix.
>
> With KAISER-like pagetable switching, this is problematic. Without a
> scratch register, switching CR3 is impossible, so %gs-based percpu
> memory would need to be mapped in the user pagetables. Doing that
> without information leaks is difficult or impossible.
>
> Instead, use a different sneaky trick. Map a copy of the first part
> of the SYSCALL asm at a different address for each CPU. Now RIP
> varies depending on the CPU, so we can use RIP-relative memory access
> to access percpu memory. By putting the relevant information (one
> scratch slot and the stack address) at a constant offset relative to
> RIP, we can make SYSCALL work without relying on %gs.

Smart!

> A nice thing about this approach is that we can easily switch it on
> and off if we want pagetable switching to be configurable.
>
> The compat variant of SYSCALL doesn't have this problem in the first
> place -- there are plenty of scratch registers, since we don't care
> about preserving r8-r15. This patch therefore doesn't touch SYSCALL32
> at all.
>
> XXX: Whenever we settle how KAISER gets turned on and off, we should do
> the same to this.
>
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>

Reviewed-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>