Re: [PATCH v6 03/25] x86/fpu/xstate: Add CET supervisor mode state support

From: Edgecombe, Rick P
Date: Thu Sep 14 2023 - 20:06:09 EST


On Thu, 2023-09-14 at 02:33 -0400, Yang Weijiang wrote:
> Add supervisor mode state support within FPU xstate management
> framework.
> Although supervisor shadow stack is not enabled/used today in
> kernel,KVM
^ Nit: needs a space
> requires the support because when KVM advertises shadow stack feature
> to
> guest, architechturally it claims the support for both user and
^ Spelling: "architecturally"
> supervisor
> modes for Linux and non-Linux guest OSes.
>
> With the xstate support, guest supervisor mode shadow stack state can
> be
> properly saved/restored when 1) guest/host FPU context is swapped 
> 2) vCPU
> thread is sched out/in.
(2) is a little bit confusing, because the lazy FPU stuff won't always
save/restore while scheduling. But trying to explain the details in
this commit log is probably unnecessary. Maybe something like?

2) At the proper times while other tasks are scheduled

I think also a key part of this is that XFEATURE_CET_KERNEL is not
*all* of the "guest supervisor mode shadow stack state", at least with
respect to the MSRs. It might be worth calling that out a little more
loudly.

>
> The alternative is to enable it in KVM domain, but KVM maintainers
> NAKed
> the solution. The external discussion can be found at [*], it ended
> up
> with adding the support in kernel instead of KVM domain.
>
> Note, in KVM case, guest CET supervisor state i.e.,
> IA32_PL{0,1,2}_MSRs,
> are preserved after VM-Exit until host/guest fpstates are swapped,
> but
> since host supervisor shadow stack is disabled, the preserved MSRs
> won't
> hurt host.

It might beg the question of if this solution will need to be redone by
some future Linux supervisor shadow stack effort. I *think* the answer
is no.

Most of the xsave managed features are restored before returning to
userspace because they would have userspace effect. But
XFEATURE_CET_KERNEL is different. It only effects the kernel. But the
IA32_PL{0,1,2}_MSRs are used when transitioning to those rings. So for
Linux they would get used when transitioning back from userspace. In
order for it to be used when control transfers back *from* userspace,
it needs to be restored before returning *to* userspace. So despite
being needed only for the kernel, and having no effect on userspace, it
might need to be swapped/restored at the same time as the rest of the
FPU state that only affects userspace.

Probably supervisor shadow stack for Linux needs much more analysis,
but trying to leave some breadcrumbs on the thinking from internal
reviews. I don't know if it might be good to include some of this
reasoning in the commit log. It's a bit hand wavy.

>
> [*]:
> https://lore.kernel.org/all/806e26c2-8d21-9cc9-a0b7-7787dd231729@xxxxxxxxx/
>
> Signed-off-by: Yang Weijiang <weijiang.yang@xxxxxxxxx>

Otherwise, the code looked good to me.