Re: [NEEDS-REVIEW] Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack

From: Andy Lutomirski
Date: Mon Sep 20 2021 - 12:51:50 EST




On Mon, Sep 13, 2021, at 6:33 PM, Edgecombe, Rick P wrote:
> On Mon, 2020-09-14 at 11:31 -0700, Andy Lutomirski wrote:
> > > On Sep 14, 2020, at 7:50 AM, Dave Hansen <dave.hansen@xxxxxxxxx>
> > > wrote:
> > >
> > > On 9/11/20 3:59 PM, Yu-cheng Yu wrote:
> > > ...
> > > > Here are the changes if we take the mprotect(PROT_SHSTK)
> > > > approach.
> > > > Any comments/suggestions?
> > >
> > > I still don't like it. :)
> > >
> > > I'll also be much happier when there's a proper changelog to
> > > accompany
> > > this which also spells out the alternatives any why they suck so
> > > much.
> > >
> >
> > Let’s take a step back here. Ignoring the precise API, what exactly
> > is
> > a shadow stack from the perspective of a Linux user program?
> >
> > The simplest answer is that it’s just memory that happens to have
> > certain protections. This enables all kinds of shenanigans. A
> > program could map a memfd twice, once as shadow stack and once as
> > non-shadow-stack, and change its control flow. Similarly, a program
> > could mprotect its shadow stack, modify it, and mprotect it back. In
> > some threat models, though could be seen as a WRSS bypass. (Although
> > if an attacker can coerce a process to call mprotect(), the game is
> > likely mostly over anyway.)
> >
> > But we could be more restrictive, or perhaps we could allow user code
> > to opt into more restrictions. For example, we could have shadow
> > stacks be special memory that cannot be written from usermode by any
> > means other than ptrace() and friends, WRSS, and actual shadow stack
> > usage.
> >
> > What is the goal?
> >
> > No matter what we do, the effects of calling vfork() are going to be
> > a
> > bit odd with SHSTK enabled. I suppose we could disallow this, but
> > that seems likely to cause its own issues.
>
> Hi,
>
> Resurrecting this old thread to highlight a consequence of the design
> change that came out of it. I am going to be taking over this series
> from Yu-cheng, and wanted to check if people would be interested in re-
> visiting this interface.
>
> The consequence I wanted to highlight, is that making userspace be
> responsible for mapping memory as shadow stack, also requires moving
> the writing of the restore token to userspace for glibc ucontext
> operations. Since these operations involve creating/pivoting to new
> stacks in userspace, ucontext cet support involves also creating a new
> shadow stack. For normal thread stacks, the kernel has always done the
> shadow stack allocation and so it is never writable (in the normal
> sense) from userspace. But after this change makecontext() now first
> has to mmap() writable memory, then write the restore token, then
> mprotect() it as shadow stack. See the glibc changes to support
> PROT_SHADOW_STACK here[0].
>
> The writable window leaves an opening for an attacker to create an
> arbitrary shadow stack that could be pivoted to later by tweaking the
> ucontext_t structure. To try to see how much this matters, we have done
> a small test that uses this window to ROP from writes in another
> thread during the makecontext()/setcontext() window. (offensive work
> credit to Joao on CC). This would require a real app to already to be
> using ucontext in the course of normal runtime.

My general opinion here (take this with a grain of salt -- I haven't paged back in every single detail) is that the kernel should make it straightforward for a libc to do the right thing without nasty races, cross-thread coordination, or unnecessary permission to write to the stack. I *also* think that it should be possible for userspace to manage its own shadow stack allocation if it wants to, since I'm sure there will be JIT or green thread or other use cases that want to do crazy things that we fail to anticipate with in-kernel magic.

So perhaps we should keep the explicit allocation and free operations, have a way to opt-in to WRSS being flipped on, but also do our best to have API that handle the known cases well.

Does that make sense? Can we have both approaches work in the same kernel?