Re: [PATCH v6 33/41] x86/shstk: Introduce map_shadow_stack syscall
From: Edgecombe, Rick P
Date: Thu Feb 23 2023 - 18:44:39 EST
On Thu, 2023-02-23 at 13:20 -0800, Deepak Gupta wrote:
> > It doesn't seem like the complexity of the checks is worth saving
> > the
> > tiny syscall. Is there some reason why riscv can't use the same
> > syscall
> > stub? It doesn't need to live forever in x86 code. Not sure what
> > the
> > savings are for riscv of the mmap+checks approach are either...
>
> I don't see a lot of extra complexity here.
> If `mprotect` and friends don't know about `PROT_SHADOWSTACK`,
> they'll
> just fail by default (which is desired)
To me, the options look like: cram it into an existing syscall or
create one that does exactly what is needed.
To replace the new syscall with mmap(), with the existing MM
implementation, I think you would need to add to mmap:
1. Limit PROT_SHADOW_STACK to anonymous, private memory.
2. Limit PROT_SHADOW_STACK to MAP_ABOVE4G (or create a MAP_SHADOW_STACK
that does both). MAP_ABOVE4G protects from using shadow stack in 32 bit
mode, after some ABI issues were raised. So it is supposed to be
enforced.
3. Add additional logic for MAP_ABOVE4G to work with MAP_FIXED as is
required by CRIU.
4. Add another MAP_ flag to specify whether to write the token (AFAIK
the first flag that would do something like that. So then likely have
debates about whether it fits into the flags). Sort out the behavior of
trying to write the token to a non-PROT_SHADOW_STACK mmap call.
5. Add arch breakout for mmap to call into the token writing.
I think you are right that no mprotect() changes would be needed with
the current shadow stack MM implementation (it wasn't the case for the
original PROT_SHADOW_STACK). But I'm still not sure if it is simpler
then the syscall.
I do wonder a little bit about trying to remove some of these
limitations of the existing shadow stack MM implementation, which could
make an mmap based interface a little better fit. Like if someday
shadow stack memory added support for all the options that mmap
supports. But I'm not sure if that would just result in more complexity
in other places (MM, signals) that would barely get used. Like, I'm not
sure how shadow stack permissioned mmap()ed files should work. You
would have to require writable files to map them as shadow stack, but
then that is not as locked down as we have today with the anonymous
memory. And a "shadow stack" file permission would seem a bit
overboard.
Then probably some more dirty bit issues with mmaped files. I'm not
fully sure. That one was definitely an issue when PROT_SHADOW_STACK was
dropped, but David Hildenbrand has now removed at least some of the
issues it hit.
So the optimum solution might depend on if we add more shadow stack MM
support later. But it is always possible to add mmap support later too.
>
> It's only `mmap` that needs to be enlightened. And it can just pass
> `VMA_SHADOW_STACK` to `do_mmap` if input is `PROT_SHADOWSTACK`.
>
> Adding a syscall just for mapping shadow stack is weird when it can
> be
> solved with existing system calls.
> As you say in your response below, it would be good to have such a
> syscall which serve larger purposes (e.g. provisioning special
> security-type memory)
>
> arm64's memory tagging is one such example. Not exactly security-type
> memory (but eventual application is security for this feature) .
> It adds extra meaning to virtual addresses (i.e. an address has
> tags).
> arm64 went about using a protection flag `PROT_MTE` instead of a
> special system call.
It looks like that memory can be written with a special instruction?
And so it doesn't need to be provisioned by the kernel, as is the case
here.
>
> Being said that since this patch has gone through multiple revisions
> and I am new to the party. If others dont have issues on this special
> system call,
> I think it's fine then. In case of riscv I can choose to use this
> mechanism or go via arm's route to define PROT_SHADOWSTACK which is
> arch specific.
Ok, sounds good. The advice I got from maintainers after the many
attempts to cram it into the existing interfaces was: don't be afraid
to add a syscall. And sure enough, when MAP_ABOVE4G came along it
continued to make things simpler.