[RFC][PATCH 00/24] x86/pti: Defer CR3 switch to C code

From: Alexandre Chartre
Date: Mon Nov 09 2020 - 09:42:47 EST


[Resending without messing up email addresses (hopefully!),
Please reply using this email thread to have correct emails.
Sorry for the noise.]

With Page Table Isolation (PTI), syscalls as well as interrupts and
exceptions occurring in userspace enter the kernel with a user
page-table. The kernel entry code will then switch the page-table
from the user page-table to the kernel page-table by updating the
CR3 control register. This CR3 switch is currently done early in
the kernel entry sequence using assembly code.

This RFC proposes to defer the PTI CR3 switch until we reach C code.
The benefit is that this simplifies the assembly entry code, and make
the PTI CR3 switch code easier to understand. This also paves the way
for further possible projects such an easier integration of Address
Space Isolation (ASI), or the possibilily to execute some selected
syscall or interrupt handlers without switching to the kernel page-table
(and thus avoid the PTI page-table switch overhead).

Deferring CR3 switch to C code means that we need to run more of the
kernel entry code with the user page-table. To do so, we need to:

- map more syscall, interrupt and exception entry code into the user
page-table (map all noinstr code);

- map additional data used in the entry code (such as stack canary);

- run more entry code on the trampoline stack (which is mapped both
in the kernel and in the user page-table) until we switch to the
kernel page-table and then switch to the kernel stack;

- have a per-task trampoline stack instead of a per-cpu trampoline
stack, so the task can be scheduled out while it hasn't switched
to the kernel stack.

Note that, for now, the CR3 switch can only be pushed as far as interrupts
remain disabled in the entry code. This is because the CR3 switch is done
based on the privilege level from the CS register from the interrupt frame.
I plan to fix this but that's some extra complication (need to track if the
user page-table is used or not).

The proposed patchset is in RFC state to get early feedback about this
proposal.

The code survives running a kernel build and LTP. Note that changes are
only for 64-bit at the moment, I haven't looked at 32-bit yet but I will
definitively check it.

Code is based on v5.10-rc3.

Thanks,

alex.

-----

Alexandre Chartre (24):
x86/syscall: Add wrapper for invoking syscall function
x86/entry: Update asm_call_on_stack to support more function arguments
x86/entry: Consolidate IST entry from userspace
x86/sev-es: Define a setup stack function for the VC idtentry
x86/entry: Implement ret_from_fork body with C code
x86/pti: Provide C variants of PTI switch CR3 macros
x86/entry: Fill ESPFIX stack using C code
x86/entry: Add C version of SWAPGS and SWAPGS_UNSAFE_STACK
x86/entry: Add C version of paranoid_entry/exit
x86/pti: Introduce per-task PTI trampoline stack
x86/pti: Function to clone page-table entries from a specified mm
x86/pti: Function to map per-cpu page-table entry
x86/pti: Extend PTI user mappings
x86/pti: Use PTI stack instead of trampoline stack
x86/pti: Execute syscall functions on the kernel stack
x86/pti: Execute IDT handlers on the kernel stack
x86/pti: Execute IDT handlers with error code on the kernel stack
x86/pti: Execute system vector handlers on the kernel stack
x86/pti: Execute page fault handler on the kernel stack
x86/pti: Execute NMI handler on the kernel stack
x86/entry: Disable stack-protector for IST entry C handlers
x86/entry: Defer paranoid entry/exit to C code
x86/entry: Remove paranoid_entry and paranoid_exit
x86/pti: Defer CR3 switch to C code for non-IST and syscall entries

arch/x86/entry/common.c | 259 ++++++++++++-
arch/x86/entry/entry_64.S | 513 ++++++++------------------
arch/x86/entry/entry_64_compat.S | 22 --
arch/x86/include/asm/entry-common.h | 108 ++++++
arch/x86/include/asm/idtentry.h | 153 +++++++-
arch/x86/include/asm/irq_stack.h | 11 +
arch/x86/include/asm/page_64_types.h | 36 +-
arch/x86/include/asm/paravirt.h | 15 +
arch/x86/include/asm/paravirt_types.h | 17 +-
arch/x86/include/asm/processor.h | 3 +
arch/x86/include/asm/pti.h | 18 +
arch/x86/include/asm/switch_to.h | 7 +-
arch/x86/include/asm/traps.h | 2 +-
arch/x86/kernel/cpu/mce/core.c | 7 +-
arch/x86/kernel/espfix_64.c | 41 ++
arch/x86/kernel/nmi.c | 34 +-
arch/x86/kernel/sev-es.c | 52 +++
arch/x86/kernel/traps.c | 61 +--
arch/x86/mm/fault.c | 11 +-
arch/x86/mm/pti.c | 71 ++--
kernel/fork.c | 22 ++
21 files changed, 1002 insertions(+), 461 deletions(-)

--
2.18.4