Re: [PATCH v10 01/26] Documentation/x86: Add CET description

From: Andrew Cooper
Date: Fri May 22 2020 - 13:48:28 EST


On 22/05/2020 17:49, Peter Zijlstra wrote:
> On Sat, May 16, 2020 at 03:09:22PM +0100, Andrew Cooper wrote:
>
>> Sadly, the same is not true for kernel shadow stacks.
>>
>> SSP is 0 after SYSCALL, SYSENTER and CLRSSBSY, and you've got to be
>> careful to re-establish the shadow stack before a CALL, interrupt or
>> exception tries pushing a word onto the shadow stack at 0xfffffffffffffff8.
> Oh man, I can only imagine the joy that brings to #NM and friends :-(

Establishing a supervisor shadow stack for the first time involves a
large leap of faith, even by usual x86 standards.

You need to have prepared MSR_PL0_SSP with correct mappings and
supervisor tokens, such that when you enable CR4.CET and
MSR_S_CET.SHSTK_EN, your SETSSBSY instruction succeeds at its atomic
"check the token and set the busy bit" shadow stack access. Any failure
here tends to be a triple fault, and I didn't get around to figuring out
why #DF wasn't taken cleanly.

You also need to have prepared MSR_IST_SSP beforehand with the IST
shadow stack pointers matching any IST configuration in the IDT, lest a
NMI ruins your day on the instruction boundary before SETSSBSY.

A less obvious side effect of these "windows with an SSP of 0" is that
you're now forced to use IST for all non-maskable interrupts/exceptions,
even if you choose not to use SYSCALL, and you no longer need IST to
remove the risks of a userspace privilege escalation, and would prefer
not to use IST because of its problematic reentrancy characteristics.

For anyone counting the number of IST-necessary vectors across all
potential configurations in modern hardware, its #DB, NMI, #DF, #MC,
#VE, #HV, #VC and #SX, and an architectural limit of 7.

There are several other amusing aspects, such as iret-to-self needing to
use call-oriented-programming to keep itself shadow-stack-safe, or the
fact that IRET to user mode doesn't fault if it fails to clear the
supervisor busy bit, instead leaving you to double fault at some point
in the future at the next syscall/interrupt/exception because the stack
is still busy.

~Andrew

P.S. For anyone interested,
https://lore.kernel.org/xen-devel/20200501225838.9866-1-andrew.cooper3@xxxxxxxxxx/T/#u
for getting supervisor shadow stacks working on Xen, which is far
simpler to manage than Linux. I do not envy whomever has the fun of
trying to make this work for Linux.