Re: [PATCH v7 01/41] Documentation/x86: Add CET shadow stack description

From: szabolcs.nagy@xxxxxxx
Date: Fri Mar 03 2023 - 11:31:36 EST


The 03/02/2023 21:17, Edgecombe, Rick P wrote:
> Is the idea that shadow stack would be forced on regardless of if the
> linked libraries support it? In which case it could be allowed to crash
> if they do not?

execute a binary
- with shstk enabled and locked (only if marked?).
- with shstk disabled and locked.
could be managed in userspace, but it is libc dependent then.

> > > > - I think it's better to have a new limit specifically for shadow
> > > > stack size (which by default can be RLIMIT_STACK) so userspace
> > > > can adjust it if needed (another reason is that stack size is
> > > > not always a good indicator of max call depth).
>
> Looking at this again, I'm not sure why a new rlimit is needed. It
> seems many of those points were just formulations of that the clone3
> stack size was not used, but it actually is and just not documented. If
> you disagree perhaps you could elaborate on what the requirements are
> and we can see if it seems tricky to do in a follow up.

- tiny thread stack and deep signal stack.
(note that this does not really work with glibc because it has
implementation internal signals that don't run on alt stack,
cannot be masked and don't fit on a tiny thread stack, but
with other runtimes this can be a valid use-case, e.g. musl
allows tiny thread stacks, < pagesize.)

- thread runtimes with clone (glibc uses clone3 but some dont).

- huge stacks but small call depth (problem if some va limit
is hit or memory overcommit is disabled).

> > "sigaltshstk() is separate from sigaltstack(). You can have one
> > without the other, neither or both together. Because the shadow
> > stack specific state is pushed to the shadow stack, the two
> > features don’t need to know about each other."
...
> > i don't see why automatic alt shadow stack allocation would
> > not work (kernel manages it transparently when an alt stack
> > is installed or disabled).
>
> Ah, I think I see where maybe I can fill you in. Andy Luto had
> discounted this idea out of hand originally, but I didn't see it at
> first. sigaltstack lets you set, retrieve, or disable the shadow stack,
> right... But this doesn't allocate anything, it just sets where the
> next signal will be handled. This is different than things like threads
> where there is a new resources being allocated and it makes coming up
> with logic to guess when to de-allocate the alt shadow stack difficult.
> You probably already know...
>
> But because of this there can be some modes where the shadow stack is
> changed while on it. For one example, SS_AUTODISARM will disable the
> alt shadow stack while switching to it and restore when sigreturning.
> At which point a new altstack can be set. In the non-shadow stack case
> this is nice because future signals won't clobber the alt stack if you
> switch away from it (swapcontext(), etc). But it also means you can
> "change" the alt stack while on it ("change" sort of, the auto disarm
> results in the kernel forgetting it temporarily).

the problem with swapcontext is that it may unmask signals
that run on the alt stack, which means the code cannot jump
back after another signal clobbered the alt stack.

the non-standard SS_AUTODISARM aims to solve this by disabling
alt stack settings on signal entry until the handler returns.

so this use case is not about supporting swapcontext out, but
about jumping back. however that does not work reliably with
this patchset: if swapcontext goes to the thread stack (and
not to another stack e.g. used by makecontext), then jump back
fails. (and if there is a sigaltshstk installed then even jump
out fails.)

assuming
- jump out from alt shadow stack can be made to work.
- alt shadow stack management can be automatic.
then this can be improved so jump back works reliably.

> I hear where you are coming from with the desire to have it "just work"
> with existing code, but I think the resulting ABI around the alt shadow
> stack allocation lifecycle would be way too complicated even if it
> could be made to work. Hence making a new interface. But also, the idea
> was that the x86 signal ABI should support handling alt shadow stacks,
> which is what we have done with this series. If a different interface
> for configuring it is better than the one from the POC, I'm not seeing
> a problem jump out. Is there any specific concern about backwards
> compatibility here?

sigaltstack syscall behaviour may be hard to change later
and currently
- shadow stack overflow cannot be recovered from.
- longjmp out of signal handler fails (with sigaltshstk).
- SS_AUTODISARM does not work (jump back can fail).

> > "Since shadow alt stacks are a new feature, longjmp()ing from an
> > alt shadow stack will simply not be supported. If a libc want’s
> > to support this it will need to enable WRSS and write it’s own
> > restore token."
> >
> > i think longjmp should work without enabling writes to the shadow
> > stack in the libc. this can also affect unwinding across signal
> > handlers (not for c++ but e.g. glibc thread cancellation).
>
> glibc today does not support longjmp()ing from a different stack (for
> example even today after a swapcontext()) when shadow stack is used. If
> glibc used wrss it could be supported maybe, but otherwise I don't see
> how the HW can support it.
>
> HJ and I were actually just discussing this the other day. Are you
> looking at this series with respect to the arm shadow stack feature by
> any chance? I would love if glibc/tools would document what the shadow
> stack limitations are. If the all the arch's have the same or similar
> limitations perhaps this could be one developer guide. For the most
> part though, the limitations I've encountered are in glibc and the
> kernel is more the building blocks.

well we hope that shadow stack behaviour and limitations can
be similar across targets.

longjmp to different stack should work: it can do the same as
setcontext/swapcontext: scan for the pivot token. then only
longjmp out of alt shadow stack fails. (this is non-conforming
longjmp use, but e.g. qemu relies on it.)

for longjmp out of alt shadow stack, the target shadow stack
needs a pivot token, which implies the kernel needs to push that
on signal entry, which can overflow. but i suspect that can be
handled the same way as stackoverflow on signal entry is handled.

> A general comment. Not sure if you are aware, but this shadow stack
> enabling effort is quite old at this point and there have been many
> discussions on these topics stretching back years. The latest
> conversation was around getting this series into linux-next soon to get
> some testing on the MM pieces. I really appreciate getting this ABI
> feedback as it is always tricky to get right, but at this stage I would
> hope to be focusing mostly on concrete problems.
>
> I also expect to have some amount of ABI growth going forward with all
> the normal things that entails. Shadow stack is not special in that it
> can come fully finalized without the need for the real world usage
> iterative feedback process. At some point we need to move forward with
> something, and we have quite a bit of initial changes at this point.
>
> So I would like to minimize the initial implementation unless anyone
> sees any likely problems with future growth. Can you be clear if you
> see any concrete problems at this point or are more looking to evaluate
> the design reasoning? I'm under the assumption there is nothing that
> would prohibit linux-next testing while any ABI shakedown happens
> concurrently at least?

understood.

the points that i think are worth raising:

- shadow stack size logic may need to change later.
(it can be too big, or too small in practice.)
- shadow stack overflow is not recoverable and the
possible fix for that (sigaltshstk) breaks longjmp
out of signal handlers.
- jump back after SS_AUTODISARM swapcontext cannot be
reliable if alt signal uses thread shadow stack.
- the above two concerns may be mitigated by different
sigaltstack behaviour which may be hard to add later.
- end token for backtrace may be useful, if added
later it can be hard to check.

thanks.