Re: [PATCH v7 01/41] Documentation/x86: Add CET shadow stack description
From: Edgecombe, Rick P
Date: Mon Mar 06 2023 - 13:07:11 EST
+Kan for shadow stack perf discussion.
On Mon, 2023-03-06 at 16:20 +0000, szabolcs.nagy@xxxxxxx wrote:
> The 03/03/2023 22:35, Edgecombe, Rick P wrote:
> > I think I slightly prefer the former arch_prctl() based solution
> > for a
> > few reasons:
> > - When you need to find the start or end of the shadow stack can
> > you
> > can just ask for it instead of searching. It can be faster and
> > simpler.
> > - It saves 8 bytes of memory per shadow stack.
> >
> > If this turns out to be wrong and we want to do the marker solution
> > much later at some point, the safest option would probably be to
> > create
> > new flags.
>
> i see two problems with a get bounds syscall:
>
> - syscall overhead.
>
> - discontinous shadow stack (e.g. alt shadow stack ends with a
> pointer to the interrupted thread shadow stack, so stack trace
> can continue there, except you don't know the bounds of that).
>
> > But just discussing this with HJ, can you share more on what the
> > usage
> > is? Like which backtracing operation specifically needs the marker?
> > How
> > much does it care about the ucontext case?
>
> it could be an option for perf or ptracers to sample the stack trace.
>
> in-process collection of stack trace for profiling or crash reporting
> (e.g. when stack is corrupted) or cross checking stack integrity may
> use it too.
>
> sometimes parsing /proc/self/smaps maybe enough, but the idea was to
> enable light-weight backtrace collection in an async-signal-safe way.
>
> syscall overhead in case of frequent stack trace collection can be
> avoided by caching (in tls) when ssp falls within the thread shadow
> stack bounds. otherwise caching does not work as the shadow stack may
> be reused (alt shadow stack or ucontext case).
>
> unfortunately i don't know if syscall overhead is actually a problem
> (probably not) or if backtrace across signal handlers need to work
> with alt shadow stack (i guess it should work for crash reporting).
There was a POC done of perf integration. I'm not too knowledgeable on
perf, but the patch itself didn't need any new shadow stack bounds ABI.
Since it was implemented in the kernel, it could just refer to the
kernel's internal data for the thread's shadow stack bounds.
I asked about ucontext (similar to alt shadow stacks in regards to lack
of bounds ABI), and apparently perf usually focuses on the thread
stacks. Hopefully Kan can lend some more confidence to that assertion.