Re: PKU usage improvements for threads

From: Stephen Röttger
Date: Wed Aug 24 2022 - 04:52:15 EST


On Tue, Aug 23, 2022 at 8:24 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
>
>
> On Tue, Aug 23, 2022, at 11:12 AM, Dave Hansen wrote:
> > On 8/23/22 04:08, Stephen Röttger wrote:
> >> On Mon, Aug 22, 2022 at 11:11 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
> >>> On 8/22/22 13:40, Kees Cook wrote:
> >>>> 1) It appears to be a bug that a thread without the correct PK can make
> >>>> VMAs covered by a separate PK, out from under other threads. (e.g. mmap
> >>>> a new mapping to wipe out the defined PK for it.) It seems that PK checks
> >>>> should be made when modifying VMAs.
> >>>
> >>> Could you give an example of this? Is this something along the lines of
> >>> a mmap(MAP_FIXED) wiping out an earlier mapping? Or, is it more subtle
> >>> than that?
> >>
> >> Yes, that's one example. And the same applies to other operations on the
> >> VMA. E.g. another case we'd like to prevent would be munmap(addr) where
> >> addr is covered by a pkey to which the calling thread doesn't have access
> >> permissions to.
> >
> > Yeah, that's something for which our defenses are quite weak. But, it
> > also calls for a very generic mm/ solution and not something specific at
> > all to pkeys.

We were also thinking about if this should be a more generic feature instead of
being tied to pkeys. I.e. the doc above has an alternative proposal to introduce
something like a memory seal/unseal syscall.
I was personally leaning towards using pkeys for this for a few reasons:
* intuitively it would make sense to me to extend PKEY_DISABLE_ACCESS
to also mean disable all changes to the memory, not just the data.
* We couldn't come up with another use case for the more generic API that
doesn't include pkeys.
* Fewer syscalls for our use case, since we already set the pkey on the mappings
we want to become immutable.
That being said, either approach works for us.

> > The concept would make a good lightning talk at Plumbers of LSF/MM.
>
> This kind of thing seems questionable to me. If the attacker controls syscall arguments, they can do almost anything. ISTM a CFI scheme should aim to prevent that bogus call in the first place, e.g. by preventing a problematic call.

I'm not sure that CFI can solve this problem since the attacker
doesn't change the
control flow in this attack. We will definitely have to ensure that
certain syscall
arguments can't be controlled by the attacker, e.g. if the protection
bits of an mmap
syscall are ever spilled to the stack, that would be one way to bypass
the mitigation
and we have to ensure in userspace that this doesn't happen.
However, this seems infeasible for common syscalls like munmap. If any thread
wants to unmap a page, it's likely that the data for that came from a
place that the
attacker might have corrupted.

> Which makes me think that the actual solution is to have syscall interception support changing PKRU, perhaps via sigaltstack.

Can you expand on this? I didn't really understand what you meant.

>
> >
> >>>> 2) It would be very helpful to have a mechanism for the signal stack to
> >>>> be PK aware, in the sense that the kernel would switch to a predefined
> >>>> PK. i.e. having a new interface to sigaltstack() which includes a PK.
> >>>
> >>> Are you thinking that when switching to the sigaltstack that it would
> >>> also pick up a specific PKRU value? Or, that it would ensure that PKRU
> >>> allows access to the sigaltstack's pkey?
> >>
> >> Either of those would work for us.
> >>
> >>> Logically something like this:
> >>>
> >>> stack_t sas = {
> >>> ss_sp = stack_ptr;
> >>> ss_flags = ... flags;
> >>> ss_size = ...;
> >>> ss_pkey = 12;
> >>> };
> >>>
> >>> Then the kernel would set up RSP to point to ss_sp, and do (logically):
> >>>
> >>> pkkru &= ~(3<<(12*2)); // clear Write and Access-disable for pkey-12
> >>>
> >>> before building the signal frame running the signal handler?
> >>
> >> Yeah, that would work for our use case.
> >> We also have a doc discussing this in more detail :) :
> >
> > That doesn't seem like it would be too much of a stretch. There's a
> > delicate point when building the stack frame that the kernel would need
> > to move over to the new PKRU value to build the frame before it writes
> > the *OLD* value to the frame. But, it's far from impossible.
> >
> > I also bet we could do this with minimal new ABI. There's already a
> > ->ss_flags field. We could assign a flag to mean that stack_t doesn't
> > end at '->ss_size' and that there's a pkey value *after* ss_size. I do
> > think having a single pkey that is made accessible before signal entry
> > is a more flexible ABI than taking an explicit PKRU value.
> >
> > I think that would allow just reusing sys_sigaltstack().
>
> sys_sigaltstack() is already pretty much useless with SHSTK, and it’s kinda busted with AVX512. How about we just add a whole new non-kludgy API?

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature