Re: [PATCH] [v2] x86/doc: add PTI description
From: Randy Dunlap
Date: Thu Jan 04 2018 - 21:45:56 EST
On 01/04/2018 04:24 PM, Dave Hansen wrote:
> Changes from v1:
> * update kernel-parameters.txt to clarify that the pti= option
> is not just for disabling. Also describe what 'pti=auto' does
> and why
> * Add a note about the presence of NX in the user portion of the
> kernel page tables
> * Clarify _additional_ 4k of PGD space
> * Add a note about the runtime overhead of PCID without INVPCID
>
> ---
>
> From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>
> Add some details about how PTI works, what some of the downsides
> are, and how to debug it when things go wrong.
>
> Also document the kernel parameter: 'nopti'.
>
> Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx>
> Cc: Moritz Lipp <moritz.lipp@xxxxxxxxxxxxxx>
> Cc: Daniel Gruss <daniel.gruss@xxxxxxxxxxxxxx>
> Cc: Michael Schwarz <michael.schwarz@xxxxxxxxxxxxxx>
> Cc: Richard Fellner <richard.fellner@xxxxxxxxxxxxxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: x86@xxxxxxxxxx
> ---
>
> b/Documentation/admin-guide/kernel-parameters.txt | 22 +-
> b/Documentation/x86/pti.txt | 185 ++++++++++++++++++++++
> 2 files changed, 200 insertions(+), 7 deletions(-)
> diff -puN /dev/null Documentation/x86/pti.txt
> --- /dev/null 2017-12-15 13:48:30.454245127 -0800
> +++ b/Documentation/x86/pti.txt 2018-01-04 16:23:40.870819409 -0800
> @@ -0,0 +1,185 @@
> +The userspace copy is used when running userspace and mirrors the
> +mapping of userspace present in the kernel copy. It maps a only
drop: a
> +the kernel data needed to enter and exit the kernel. This data
> +is entirely contained in the 'struct cpu_entry_area' structure
> +which is placed in the fixmap and thus each CPU's copy of the
> +area has a compile-time-fixed virtual address.
> +
> +2. Runtime Cost
> + a. CR3 manipulation to switch between the page table copies
> + must be done at interrupt, syscall, and exception entry
> + and exit (it can be skipped when the kernel is interrupted,
> + though.) Moves to CR3 are on the order of a hundred
> + cycles, and are required every at entry and every at exit.
at every entry and at every exit.
> + d. Global pages are disabled for all kernel structures not
> + mapped in both to kernel and userspace page tables. This
into both kernel and userspace page tables.
> + feature of the MMU allows different processes to share TLB
> + entries mapping the kernel. Losing the feature means more
> + TLB misses after a context switch. The actual loss of
> + performance is very small, however, never exceeding 1%.
> + f. In addition to the fork()-time copying, there must also
> + be an update to the userspace PGD any time a set_pgd() is done
> + on a PGD used to map userspace. This ensures that the kernel
> + and userspace copies always map the same userspace
> + memory.
> + g. On systems without PCID support, each CR3 write flushes
> + the entire TLB. That means that each syscall, interrupt
> + or exception flushes the TLB.
> + h. On systems without INVPCID support, addresses can only be
This is the first mention of INVPCID. Probably needs more info
about what it is.
> + flushed from the TLB for the current PCID. When flushing
> + a kernel address, we need to flush all PCIDs, so a single
> + kernel address flush will require a TLB-flushing CR3 write
> + upon the next use of every PCID.
> +
> +Possible Future Work
> +====================
> +1. We can be more careful about not actually writing to CR3
> + unless its value is actually changed.
> +2. Allow PTI to enabled/disabled at runtime in addition to the
to be
> + boot-time switching.
> +
> +Testing
> +========
--
~Randy