Re: [PATCH RFC 0/4] Per-task PTI activation

From: Eric W. Biederman
Date: Tue Jan 09 2018 - 10:32:21 EST


Willy Tarreau <w@xxxxxx> writes:

> Hi!
>
> I could experiment a bit with the possibility to enable/disable PTI per
> task. Please keep in mind that it's not my area of experitise at all, but
> doing so I could recover the initial performance without disabling PTI on
> the whole system.
>
> So what I did in this series consists in the following :
> - addition of a new per-task TIF_NOPTI flag. Please note that I'm not
> proud of the way I did it, as 32 flags were already taken. The flags
> are declared as "long" so there are 32 more flags available on x86_64
> but C and asm disagree on the type of 1<<32 so I had to declare the
> hex value by hand... By the way I even suspect that _TIF_FSCHECK is
> wrong once cast to a long, I think it causes sign extension into the
> 32 upper bits since it's supposed to be signed.
>
> - addition of a set of arch_prctl() calls (ARCH_GET_NOPTI and
> ARCH_SET_NOPTI), to check and change the activation of the
> protection. The change requires CAP_SYS_RAWIO and can be done in
> a wrapper (that's how I tested)
>
> - the user PGD was marked with _PAGE_NX to prevent an accidental leak
> of CR3 from not being detected. I obviously had to disable this since
> in this case we do want such a user task to run without switching the
> PGD. I think this could be performed per-task maybe. Another approach
> might consist in dealing with 3 PGDs and using a different one for
> unprotected tasks but that really starts to sound overkill.
>
> - upon return to userspace, I check if the task's flags contain the
> new TIF_NOPTI or not. If it does contain it, then we don't switch
> the CR3.
>
> - upon entry into the kernel from userspace, we can't access the task's
> flags but we can already check if CR3 points to the kernel or user PGD,
> and we refrain from switching if it's already the system one.
>
> By doing so I could recover the initial performance of haproxy in a VM,
> going from 12400 connections per second to 21000 once started with this
> trivial wrapper :
>
> #include <asm/prctl.h>
> #include <sys/prctl.h>
>
> #ifndef ARCH_SET_NOPTI
> #define ARCH_SET_NOPTI 0x1022
> #endif
>
> int main(int argc, char **argv)
> {
> arch_prctl(ARCH_SET_NOPTI, 1);
> argv++;
> return execvp(argv[0], argv);
> }
>
> I have not yet run it on real hardware. Before trying to go a bit further
> I'd like to know if such an approach is acceptable or if I'm doing anything
> stupid and looking in the wrong direction.

Before this goes much farther I want to point something out.

When I have kpti protecting me it is the applications with that connect
to the network I worry about. Until I get to a system with users that
don't trust each other local I don't have a reason to worry about these
attacks from local applications.

The dangerous scenario is someone exploting a buffer overflow, or
otherwise getting a network facing application to misbehave, and then
using these new attacks to assist in gaining privilege escalation.


Googling seems to indicate that there is about one issue a year found in
haproxy. So this is not an unrealistic concern for the case you
mention.


So unless I am seeing things wrong this is a patchset designed to drop
your defensense on the most vulnerable applications.


Disably protection on the most vunerable applications is not behavior
I would encourage. It seems better than disabling protection system
wide but only slightly. I definitely don't think this is something we
want applications disabling themselves.

Certainly this is something that should look at no-new-privs and if
no-new-privs is set not allow disabling this protection.

Eric