Re: [PATCH RFC 2/4] x86/arch_prctl: add ARCH_GET_NOPTI and ARCH_SET_NOPTI to enable/disable PTI
From: Willy Tarreau
Date: Mon Jan 08 2018 - 15:36:34 EST
[ increased the CC list this time ]
On Mon, Jan 08, 2018 at 09:54:05AM -0800, Linus Torvalds wrote:
> On Mon, Jan 8, 2018 at 8:12 AM, Willy Tarreau <w@xxxxxx> wrote:
> > This allows to report the current state of the PTI protection and to
> > enable or disable it for the current task.
>
> So I really think that this needs to be done up-front to avoid a lot
> of complexity. And per mm.
>
> If the process is already threaded (so the mm has multiple users),
> it's too late to start playing games with PTI.
>
> In fact, maybe the whole thing needs to be controlled before "exec"
> happens, so that we have the knowledge as we build up the mm, rather
> than being "runtime" dynamic at all.
>
> But in no case should you even try to handle the multi-threaded case -
> just error out for trying to change the PTI setting.
>
> So make the thing per-mm, and then at task switch time as you switch
> mms, you set the bit in a percpu variable for testing at kernel entry.
So I did something like this (will have to remerge the awful patches
and remove the printks before resending). In short, now here's what it
does :
- added a new x86 flag : "mm->context.pti_disable", depends on
CONFIG_PAGE_TABLE_ISOLATION.
- the new prctl() also depends on this config setting.
- prctl() refuses any change if mm->mm_users > 1
- prctl() refuses to set nopti if !CAP_SYS_RAWIO, but clearing it is
fine without (Ingo's idea)
- __switch_to() sets a new "pti_disable" per-cpu variable to the copy
of mm->context.pti_disable
- entry code in SWITCH_TO_USER_CR3_NOSTACK now checks
PER_CPU_VAR(pti_disable)
First tests show that it still works. One main difference I immediately
observed is that it stops at execve(). This means that it will not be
possible to implement a wrapper to enable the bypass, but on the other
hand it guarantees that any execve() even from a so called "trusted"
process doesn't accidently expose a victim program. So there are pros
and cons here.
I'm personally fine with both the wrapper and the code changes. But I'm
in the easiest situation, working with opensource code that I can easily
update to accommodate the changes. Other users might have a different
opinion here.
Another option could be to have a per-task (and really task here) flag
is only passed to execve() to mention that the per-mm pti_disable has
to be set in the new mm (and which would clear the task flag). But this
mechanism would always require a wrapper. Or we could have both.
I'll clean up my patches tomorrow morning and will post an update. Ideas
and objections welcome in the mean time ;-)
Cheers,
Willy