Re: extending ucontext (Re: [PATCH v26 25/30] x86/cet/shstk: Handle signals for shadow stack)

From: Yu, Yu-cheng
Date: Thu May 06 2021 - 18:05:40 EST


On 5/4/2021 1:49 PM, Yu, Yu-cheng wrote:
On 4/30/2021 11:32 AM, Yu, Yu-cheng wrote:
On 4/30/2021 10:47 AM, Andy Lutomirski wrote:
On Fri, Apr 30, 2021 at 10:00 AM Yu, Yu-cheng <yu-cheng.yu@xxxxxxxxx> wrote:

On 4/28/2021 4:03 PM, Andy Lutomirski wrote:
On Tue, Apr 27, 2021 at 1:44 PM Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx> wrote:

When shadow stack is enabled, a task's shadow stack states must be saved
along with the signal context and later restored in sigreturn. However,
currently there is no systematic facility for extending a signal context.
There is some space left in the ucontext, but changing ucontext is likely
to create compatibility issues and there is not enough space for further
extensions.

Introduce a signal context extension struct 'sc_ext', which is used to save
shadow stack restore token address.  The extension is located above the fpu
states, plus alignment.  The struct can be extended (such as the ibt's
wait_endbr status to be introduced later), and sc_ext.total_size field
keeps track of total size.

I still don't like this.


[...]


That's where we are right now upstream.  The kernel has a parser for
the FPU state that is bugs piled upon bugs and is going to have to be
rewritten sometime soon.  On top of all this, we have two upcoming
features, both of which require different kinds of extensions:

1. AVX-512.  (Yeah, you thought this story was over a few years ago,
but no.  And AMX makes it worse.)  To make a long story short, we
promised user code many years ago that a signal frame fit in 2048
bytes with some room to spare.  With AVX-512 this is false.  With AMX
it's so wrong it's not even funny.  The only way out of the mess
anyone has come up with involves making the length of the FPU state
vary depending on which features are INIT, i.e. making it more compact
than "compact" mode is.  This has a side effect: it's no longer
possible to modify the state in place, because enabling a feature with
no space allocated will make the structure bigger, and the stack won't
have room.  Fortunately, one can relocate the entire FPU state, update
the pointer in mcontext, and the kernel will happily follow the
pointer.  So new code on a new kernel using a super-compact state
could expand the state by allocating new memory (on the heap? very
awkwardly on the stack?) and changing the pointer.  For all we know,
some code already fiddles with the pointer.  This is great, except
that your patch sticks more data at the end of the FPU block that no
one is expecting, and your sigreturn code follows that pointer, and
will read off into lala land.


Then, what about we don't do that at all.  Is it possible from now on we
don't stick more data at the end, and take the relocating-fpu approach?

2. CET.  CET wants us to find a few more bytes somewhere, and those
bytes logically belong in ucontext, and here we are.


Fortunately, we can spare CET the need of ucontext extension.  When the
kernel handles sigreturn, the user-mode shadow stack pointer is right at
the restore token.  There is no need to put that in ucontext.

That seems entirely reasonable.  This might also avoid needing to
teach CRIU about CET at all.


However, the WAIT_ENDBR status needs to be saved/restored for signals.
Since IBT is now dependent on shadow stack, we can use a spare bit of
the shadow stack restore token for that.

That seems like unnecessary ABI coupling.  We have plenty of bits in
uc_flags, and we have an entire reserved word in sigcontext.  How
about just sticking this bit in one of those places?

Yes, I will make it UC_WAIT_ENDBR.

Personally, I think an explicit flag is cleaner than using a reserved word somewhere.  However, there is a small issue: ia32 has no uc_flags.

This series can support legacy apps up to now.  But, instead of creating too many special cases, perhaps we should drop CET support of ia32?

Thoughts?


Once we have UC_WAIT_ENDBR, IBT signal handling becomes quite simple. Like the following:

diff --git a/arch/x86/include/uapi/asm/ucontext.h b/arch/x86/include/uapi/asm/ucontext.h
index 5657b7a49f03..96375d609e11 100644
--- a/arch/x86/include/uapi/asm/ucontext.h
+++ b/arch/x86/include/uapi/asm/ucontext.h
@@ -49,6 +49,11 @@
*/
#define UC_SIGCONTEXT_SS 0x2
#define UC_STRICT_RESTORE_SS 0x4
+
+/*
+ * UC_WAIT_ENDBR indicates the task is in wait-ENDBR status.
+ */
+#define UC_WAIT_ENDBR 0x08
#endif

#include <asm-generic/ucontext.h>
diff --git a/arch/x86/kernel/ibt.c b/arch/x86/kernel/ibt.c
index d2563dd4759f..da804314ddc4 100644
--- a/arch/x86/kernel/ibt.c
+++ b/arch/x86/kernel/ibt.c
@@ -66,3 +66,32 @@ void ibt_disable(void)
ibt_set_clear_msr_bits(0, CET_ENDBR_EN);
current->thread.cet.ibt = 0;
}
+
+int ibt_get_clear_wait_endbr(void)
+{
+ u64 msr_val = 0;
+
+ if (!current->thread.cet.ibt)
+ return 0;
+
+ fpregs_lock();
+
+ if (test_thread_flag(TIF_NEED_FPU_LOAD))
+ __fpregs_load_activate();
+
+ if (!rdmsrl_safe(MSR_IA32_U_CET, &msr_val))
+ wrmsrl(MSR_IA32_U_CET, msr_val & ~CET_WAIT_ENDBR);
+
+ fpregs_unlock();
+
+ return msr_val & CET_WAIT_ENDBR;
+}
+
+int ibt_set_wait_endbr(void)
+{
+ if (!current->thread.cet.ibt)
+ return 0;
+
+
+ return ibt_set_clear_msr_bits(CET_WAIT_ENDBR, 0);
+}
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 66b662e57e19..5afd15419006 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -46,6 +46,7 @@
#include <asm/syscall.h>
#include <asm/sigframe.h>
#include <asm/signal.h>
+#include <asm/cet.h>

#ifdef CONFIG_X86_64
/*
@@ -134,6 +135,9 @@ static int restore_sigcontext(struct pt_regs *regs,
*/
if (unlikely(!(uc_flags & UC_STRICT_RESTORE_SS) && user_64bit_mode(regs)))
force_valid_ss(regs);
+
+ if (uc_flags & UC_WAIT_ENDBR)
+ ibt_set_wait_endbr();
#endif

return fpu__restore_sig((void __user *)sc.fpstate,
@@ -433,6 +437,9 @@ static unsigned long frame_uc_flags(struct pt_regs *regs)
if (likely(user_64bit_mode(regs)))
flags |= UC_STRICT_RESTORE_SS;

+ if (ibt_get_clear_wait_endbr())
+ flags |= UC_WAIT_ENDBR;
+
return flags;
}


However, this cannot handle ia32 with no SA_SIGINFO. For that, can we create a synthetic token on the shadow stack?

- The token points to itself with reserved bit[1] set, and cannot be used for RSTORSSP.
- The token only exists for ia32 with no SA_SIGINFO *AND* when the task is in wait-endbr.

The signal shadow stack will look like this:

--> ssp before signal
synthetic IBT token (for ia32 no SA_SIGINFO)
shadow stack restore token
sigreturn address

The synthetic token is not valid in other situations.
How is that?

Thanks,
Yu-cheng