Re: [RFC v2 00/27] Kernel Address Space Isolation

From: Alexandre Chartre
Date: Fri Jul 12 2019 - 12:39:41 EST




On 7/12/19 5:16 PM, Thomas Gleixner wrote:
On Fri, 12 Jul 2019, Peter Zijlstra wrote:
On Fri, Jul 12, 2019 at 01:56:44PM +0200, Alexandre Chartre wrote:

I think that's precisely what makes ASI and PTI different and independent.
PTI is just about switching between userland and kernel page-tables, while
ASI is about switching page-table inside the kernel. You can have ASI without
having PTI. You can also use ASI for kernel threads so for code that won't
be triggered from userland and so which won't involve PTI.

PTI is not mapping kernel space to avoid speculation crap (meltdown).
ASI is not mapping part of kernel space to avoid (different) speculation crap (MDS).

See how very similar they are?

Furthermore, to recover SMT for userspace (under MDS) we not only need
core-scheduling but core-scheduling per address space. And ASI was
specifically designed to help mitigate the trainwreck just described.

By explicitly exposing (hopefully harmless) part of the kernel to MDS,
we reduce the part that needs core-scheduling and thus reduce the rate
the SMT siblngs need to sync up/schedule.

But looking at it that way, it makes no sense to retain 3 address
spaces, namely:

user / kernel exposed / kernel private.

Specifically, it makes no sense to expose part of the kernel through MDS
but not through Meltdow. Therefore we can merge the user and kernel
exposed address spaces.

And then we've fully replaced PTI.

So no, they're not orthogonal.

Right. If we decide to expose more parts of the kernel mappings then that's
just adding more stuff to the existing user (PTI) map mechanics.

If we expose more parts of the kernel mapping by adding them to the existing
user (PTI) map, then we only control the mapping of kernel sensitive data but
we don't control user mapping (with ASI, we exclude all user mappings).

How would you control the mapping of userland sensitive data and exclude them
from the user map? Would you have the application explicitly identify sensitive
data (like Andy suggested with a /dev/xpfo device)?

Thanks,

alex.


As a consequence the CR3 switching points become different or can be
consolidated and that can be handled right at those switching points
depending on static keys or alternatives as we do today with PTI and other
mitigations.

All of that can do without that obscure "state machine" which is solely
there to duct-tape the complete lack of design. The same applies to that
mapping thing. Just mapping randomly selected parts by sticking them into
an array is a non-maintainable approach. This needs proper separation of
text and data sections, so violations of the mapping constraints can be
statically analyzed. Depending solely on the page fault at run time for
analysis is just bound to lead to hard to diagnose failures in the field.

TBH we all know already that this can be done and that this will solve some
of the issues caused by the speculation mess, so just writing some hastily
cobbled together POC code which explodes just by looking at it, does not
lead to anything else than time waste on all ends.

This first needs a clear definition of protection scope. That scope clearly
defines the required mappings and consequently the transition requirements
which provide the necessary transition points for flipping CR3.

If we have agreed on that, then we can think about the implementation
details.

Thanks,

tglx