RE: [RFC][PATCH v2 00/21] x86/pti: Defer CR3 switch to C code

From: David Laight
Date: Wed Nov 18 2020 - 08:23:38 EST


From: Alexandre Chartre
> Sent: 18 November 2020 10:30
...
> Correct, this RFC is not changing the overhead. However, it is a step forward
> for being able to execute some selected syscalls or interrupt handlers without
> switching to the kernel page-table. The next step would be to identify and add
> the necessary mapping to the user page-table so that specified syscalls can be
> executed without switching the page-table.

Remember that without PTI user space can read all kernel memory.
(I'm not 100% sure you can force a cache-line read.)
It isn't even that slow.
(Even I can understand how it works.)

So if you are worried about user space doing that you can't really
run anything on the user page tables.

System calls like getpid() are irrelevant - they aren't used (much).
Even the time of day ones are implemented in the VDSO without a
context switch.

So the overheads come from other system calls that 'do work'
without actually sleeping.
I'm guessing things like read, write, sendmsg, recvmsg.

The only interesting system call I can think of is futex.
As well as all the calls that return immediately because the
mutex has been released while entering the kernel, I suspect
that being pre-empted by a different thread (of the same process)
doesn't actually need CR3 reloading (without PTI).

I also suspect that it isn't just the CR3 reload that costs.
There could (depending on the cpu) be associated TLB and/or cache
invalidations that have a much larger effect on programs with
large working sets than on simple benchmark programs.

Now bits of data that you are 'more worried about' could be kept
in physical memory that isn't normally mapped (or referenced by
a TLB) and only mapped when needed.
But that doesn't help the general case.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)