Re: [PATCH 08/30] x86, kaiser: unmap kernel from userspace page tables (core patch)

From: Dave Hansen
Date: Wed Nov 22 2017 - 18:12:02 EST

On 11/20/2017 09:21 AM, Thomas Gleixner wrote:
>> +KAISER logically keeps a "copy" of the page tables which unmap
>> +the kernel while in userspace. The kernel manages the page
>> +tables as normal, but the "copying" is done with a few tricks
>> +that mean that we do not have to manage two full copies.
>> +The first trick is that for any any new kernel mapping, we
>> +presume that we do not want it mapped to userspace. That means
>> +we normally have no copying to do. We only copy the kernel
>> +entries over to the shadow in response to a kaiser_add_*()
>> +call which is rare.
> When KAISER is enabled the kernel manages two page tables for the kernel
> mappings. The regular page table which is used while executing in kernel
> space and a shadow copy which only contains the mapping entries which are
> required for the kernel-userspace transition. These mappings have to be
> copied into the shadow page tables explicitely with the kaiser_add_*()
> functions.

This misses a few important points that I think the original text
touches on. I gave it another go:

> Page Table Management
> =====================
> When KAISER is enabled, the kernel manages two sets of page
> tables. The first copy is very similar to what would be present
> for a kernel without KAISER. This includes a complete mapping of
> userspace that the kernel can use for things like copy_to_user().
> The second (shadow) is used when running userspace and mirrors the
> mapping of userspace present in the kernel copy. It maps a only
> the kernel data needed to enter and exit the kernel.
> The shadow is populated by the kaiser_add_*() functions. Only
> kernel data which has been explicity mapped will appear in the
> shadow copy. These calls are rare at runtime.
> For a new userspace mapping, the kernel makes the entries in its
> page tables like normal. The only difference is when the kernel
> makes entries in the top (PGD) level. In addition to setting the
> entry in the main kernel PGD, a copy if the entry is made in the
> shadow PGD.
> For user space mappings the kernel creates an entry in the kernel
> PGD and the same entry in the shadow PGD, so the underlying page
> table to which the PGD entry points is shared down to the PTE
> level. This leaves a single, shared set of userspace page tables
> to manage. One PTE to lock, one set set of accessed bits, dirty
> bits, etc...