[PATCH RFC 0/4] Per-task PTI activation

From: Willy Tarreau
Date: Mon Jan 08 2018 - 11:12:52 EST


Hi!

I could experiment a bit with the possibility to enable/disable PTI per
task. Please keep in mind that it's not my area of experitise at all, but
doing so I could recover the initial performance without disabling PTI on
the whole system.

So what I did in this series consists in the following :
- addition of a new per-task TIF_NOPTI flag. Please note that I'm not
proud of the way I did it, as 32 flags were already taken. The flags
are declared as "long" so there are 32 more flags available on x86_64
but C and asm disagree on the type of 1<<32 so I had to declare the
hex value by hand... By the way I even suspect that _TIF_FSCHECK is
wrong once cast to a long, I think it causes sign extension into the
32 upper bits since it's supposed to be signed.

- addition of a set of arch_prctl() calls (ARCH_GET_NOPTI and
ARCH_SET_NOPTI), to check and change the activation of the
protection. The change requires CAP_SYS_RAWIO and can be done in
a wrapper (that's how I tested)

- the user PGD was marked with _PAGE_NX to prevent an accidental leak
of CR3 from not being detected. I obviously had to disable this since
in this case we do want such a user task to run without switching the
PGD. I think this could be performed per-task maybe. Another approach
might consist in dealing with 3 PGDs and using a different one for
unprotected tasks but that really starts to sound overkill.

- upon return to userspace, I check if the task's flags contain the
new TIF_NOPTI or not. If it does contain it, then we don't switch
the CR3.

- upon entry into the kernel from userspace, we can't access the task's
flags but we can already check if CR3 points to the kernel or user PGD,
and we refrain from switching if it's already the system one.

By doing so I could recover the initial performance of haproxy in a VM,
going from 12400 connections per second to 21000 once started with this
trivial wrapper :

#include <asm/prctl.h>
#include <sys/prctl.h>

#ifndef ARCH_SET_NOPTI
#define ARCH_SET_NOPTI 0x1022
#endif

int main(int argc, char **argv)
{
arch_prctl(ARCH_SET_NOPTI, 1);
argv++;
return execvp(argv[0], argv);
}

I have not yet run it on real hardware. Before trying to go a bit further
I'd like to know if such an approach is acceptable or if I'm doing anything
stupid and looking in the wrong direction.

Thanks!
Willy


Willy Tarreau (4):
x86/thread_info: add TIF_NOPTI to disable PTI per task
x86/arch_prctl: add ARCH_GET_NOPTI and ARCH_SET_NOPTI to
enable/disable PTI
x86/pti: don't mark the user PGD with _PAGE_NX.
x86/entry/pti: don't switch PGD on tasks holding flag TIF_NOPTI

arch/x86/entry/calling.h | 23 +++++++++++++++++++++++
arch/x86/include/asm/thread_info.h | 8 ++++++++
arch/x86/include/uapi/asm/prctl.h | 3 +++
arch/x86/kernel/process_64.c | 24 ++++++++++++++++++++++++
arch/x86/mm/pti.c | 2 ++
5 files changed, 60 insertions(+)

--
1.7.12.1