Re: [PATCH 6/9] x86, pkeys: add pkey set/get syscalls

From: Andy Lutomirski
Date: Tue Jul 12 2016 - 18:56:55 EST

On Tue, Jul 12, 2016 at 10:12 AM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
> On 07/12/2016 09:32 AM, Andy Lutomirski wrote:
>> I think it's more or less impossible to get sensible behavior passing
>> pkey != 0 data to legacy functions. If you call:
>> void frob(struct foo *p);
>> If frob in turn passes p to a thread, what PKRU is it supposed to use?
> The thread inheritance of PKRU can be nice. It actually gives things a
> good chance of working if you can control PKRU before clone(). I'd
> describe the semantics like this:
> PKRU values are inherited at the time of a clone() system
> call. Threads unaware of protection keys may work on
> protection-key-protected data as long as PKRU is set up in
> advance of the clone() and never needs to be changed inside the
> thread.
> If a thread is created before PKRU is set appropriately, the
> thread may not be able to act on protection-key-protected data.

Given the apparent need for seccomp's TSYNC, I'm a bit nervous that
this will be restrictive to a problematic degree.

> Otherwise, the semantics are simpler, but they basically give threads no
> chance of ever working:
> Threads unaware of protection keys and which can not manage
> PKRU may not operate on data where a non-zero key has been
> passed to pkey_mprotect().
> It isn't clear to me that one of these is substantially better than the
> other. It's fairly easy in either case for an app that cares to get the
> behavior of the other.
> But, one is clearly easier to implement in the kernel. :)
>>>> So how is user code supposed lock down all of its threads?
>>>> seccomp has TSYNC for this, but I don't think that PKRU allows
>>>> something like that.
>>> I'm not sure this is possible for PKRU. Think of a simple PKRU
>>> manipulation in userspace:
>>> pkru = rdpkru();
>>> pkru |= PKEY_DENY_ACCESS<<key*2;
>>> wrpkru(pkru);
>>> If we push a PKRU value into a thread between the rdpkru() and wrpkru(),
>>> we'll lose the content of that "push". I'm not sure there's any way to
>>> guarantee this with a user-controlled register.
>> We could try to insist that user code uses some vsyscall helper that
>> tracks which bits are as-yet-unassigned. That's quite messy, though.
> Yeah, doable, but not without some new data going out to userspace, plus
> the vsyscall code itself.
>> We could also arbitrarily partition the key space into
>> initially-wide-open, initially-read-only, and initially-no-access and
>> let pkey_alloc say which kind it wants.
> The point is still that wrpkru destroyed the 'push' operation. You
> always end up with a PKRU that (at least temporarily) ignored the 'push'.

Not with my partitioning proposal. We'd never asynchronously modify
another thread's state -- we'd start start with a mask that gives us a
good chance of having the initial state always be useful. To be
completely precise, the initial state would be something like:

0 = all access, 1 (PROT_EXEC) = deny read and write, 2-11: deny read
and write, 12-21: deny write, 22-31: all access

Then pkru_alloc would take a parameter giving the requested initial
state, and it would only work if a key with that initial state is

If we went with the vdso approach, the API could look like:

pkru_state_t prev = pkru_push(mask, value);


pkru_pop(prev); // or pkru_pop(mask, prev)?

This doesn't fundamentally require the vdso, except that implementing
bitwise operations on PKRU can't be done atomically with RDPKRU /
WRPKRU. Grr. This also falls apart pretty badly when sigreturn
happens, so I don't think I like this approach.