Re: PKU usage improvements for threads

From: Andy Lutomirski
Date: Tue Aug 23 2022 - 15:32:57 EST




On Tue, Aug 23, 2022, at 11:12 AM, Dave Hansen wrote:
> On 8/23/22 04:08, Stephen Röttger wrote:
>> On Mon, Aug 22, 2022 at 11:11 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>>> On 8/22/22 13:40, Kees Cook wrote:
>>>> 1) It appears to be a bug that a thread without the correct PK can make
>>>> VMAs covered by a separate PK, out from under other threads. (e.g. mmap
>>>> a new mapping to wipe out the defined PK for it.) It seems that PK checks
>>>> should be made when modifying VMAs.
>>>
>>> Could you give an example of this? Is this something along the lines of
>>> a mmap(MAP_FIXED) wiping out an earlier mapping? Or, is it more subtle
>>> than that?
>>
>> Yes, that's one example. And the same applies to other operations on the
>> VMA. E.g. another case we'd like to prevent would be munmap(addr) where
>> addr is covered by a pkey to which the calling thread doesn't have access
>> permissions to.
>
> Yeah, that's something for which our defenses are quite weak. But, it
> also calls for a very generic mm/ solution and not something specific at
> all to pkeys.
>
> I assume that you wouldn't want to turn off *all* mmap(), MAP_FIXED or
> munmap() in the process. You just want to make one or more VMAs more or
> less immutable. That _sounds_ like a topic that would have broached at
> some point in the past, although it doesn't ring any bells.
>
> The concept would make a good lightning talk at Plumbers of LSF/MM.

This kind of thing seems questionable to me. If the attacker controls syscall arguments, they can do almost anything. ISTM a CFI scheme should aim to prevent that bogus call in the first place, e.g. by preventing a problematic call.

Which makes me think that the actual solution is to have syscall interception support changing PKRU, perhaps via sigaltstack.

>
>>>> 2) It would be very helpful to have a mechanism for the signal stack to
>>>> be PK aware, in the sense that the kernel would switch to a predefined
>>>> PK. i.e. having a new interface to sigaltstack() which includes a PK.
>>>
>>> Are you thinking that when switching to the sigaltstack that it would
>>> also pick up a specific PKRU value? Or, that it would ensure that PKRU
>>> allows access to the sigaltstack's pkey?
>>
>> Either of those would work for us.
>>
>>> Logically something like this:
>>>
>>> stack_t sas = {
>>> ss_sp = stack_ptr;
>>> ss_flags = ... flags;
>>> ss_size = ...;
>>> ss_pkey = 12;
>>> };
>>>
>>> Then the kernel would set up RSP to point to ss_sp, and do (logically):
>>>
>>> pkkru &= ~(3<<(12*2)); // clear Write and Access-disable for pkey-12
>>>
>>> before building the signal frame running the signal handler?
>>
>> Yeah, that would work for our use case.
>> We also have a doc discussing this in more detail :) :
>
> That doesn't seem like it would be too much of a stretch. There's a
> delicate point when building the stack frame that the kernel would need
> to move over to the new PKRU value to build the frame before it writes
> the *OLD* value to the frame. But, it's far from impossible.
>
> I also bet we could do this with minimal new ABI. There's already a
> ->ss_flags field. We could assign a flag to mean that stack_t doesn't
> end at '->ss_size' and that there's a pkey value *after* ss_size. I do
> think having a single pkey that is made accessible before signal entry
> is a more flexible ABI than taking an explicit PKRU value.
>
> I think that would allow just reusing sys_sigaltstack().

sys_sigaltstack() is already pretty much useless with SHSTK, and it’s kinda busted with AVX512. How about we just add a whole new non-kludgy API?