Re: [RFC v2 00/27] Kernel Address Space Isolation

From: Alexandre Chartre
Date: Fri Jul 12 2019 - 04:11:19 EST



On 7/12/19 12:38 AM, Dave Hansen wrote:
On 7/11/19 7:25 AM, Alexandre Chartre wrote:
- Kernel code mapped to the ASI page-table has been reduced to:
. the entire kernel (I still need to test with only the kernel text)
. the cpu entry area (because we need the GDT to be mapped)
. the cpu ASI session (for managing ASI)
. the current stack

- Optionally, an ASI can request the following kernel mapping to be added:
. the stack canary
. the cpu offsets (this_cpu_off)
. the current task
. RCU data (rcu_data)
. CPU HW events (cpu_hw_events).

I don't see the per-cpu areas in here. But, the ASI macros in
entry_64.S (and asi_start_abort()) use per-cpu data.

We don't map all per-cpu areas, but only the per-cpu variables we need. ASI
code uses the per-cpu cpu_asi_session variable which is mapped when an ASI
is created (see patch 15/26):

+ /*
+ * Map the percpu ASI sessions. This is used by interrupt handlers
+ * to figure out if we have entered isolation and switch back to
+ * the kernel address space.
+ */
+ err = ASI_MAP_CPUVAR(asi, cpu_asi_session);
+ if (err)
+ return err;


Also, this stuff seems to do naughty stuff (calling C code, touching
per-cpu data) before the PTI CR3 writes have been done. But, I don't
see anything excluding PTI and this code from coexisting.

My understanding is that PTI CR3 writes only happens when switching to/from
userland. While ASI enter/exit/abort happens while we are already in the kernel,
so asi_start_abort() is not called when coming from userland and so not
interacting with PTI.

For example, if ASI in used during a syscall (e.g. with KVM), we have:

-> syscall
- PTI CR3 write (kernel CR3)
- syscall handler:
...
asi_enter()-> write ASI CR3
.. code run with ASI ..
asi_exit() or asi abort -> restore original CR3
...
- PTI CR3 write (userland CR3)
<- syscall


Thanks,

alex.