Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed

From: Sean Christopherson
Date: Fri Aug 04 2023 - 16:45:22 EST


On Fri, Aug 04, 2023, Weijiang Yang wrote:
> On 8/3/2023 7:15 PM, Chao Gao wrote:
> > On Thu, Aug 03, 2023 at 12:27:22AM -0400, Yang Weijiang wrote:
> > > +void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
> > > +{
> > > + if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {

Drop the unlikely, KVM should not speculate on the guest configuration or underlying
hardware.

> > > + rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
> > > + rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
> > > + rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
> > > + /*
> > > + * Omit reset to host PL{1,2}_SSP because Linux will never use
> > > + * these MSRs.
> > > + */
> > > + wrmsrl(MSR_IA32_PL0_SSP, 0);
> > This wrmsrl() can be dropped because host doesn't support SSS yet.
> Frankly speaking, I want to remove this line of code. But that would mess up the MSR
> on host side, i.e., from host perspective, the MSRs could be filled with garbage data,
> and looks awful.

So? :-)

That's the case for all of the MSRs that KVM defers restoring until the host
returns to userspace, i.e. running in the host with bogus values in hardware is
nothing new.

And as I mentioned in the other thread regarding the assertion that SSS isn't
enabled in the host, sanitizing hardware values for something that should never
be consumed is a fools errand.

> Anyway, I can remove it.

Yes please, though it may be a moot point.

> > > + }
> > > +}
> > > +EXPORT_SYMBOL_GPL(save_cet_supervisor_ssp);
> > > +
> > > +void reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
> > > +{
> > > + if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
> > ditto
> Below is to reload guest supervisor SSPs instead of resetting host ones.
> > > + wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
> > > + wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
> > > + wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);

Pulling back in the justification from v3:

the Pros:
- Super easy to implement for KVM.
- Automatically avoids saving and restoring this data when the vmexit
is handled within KVM.

the Cons:
- Unnecessarily restores XFEATURE_CET_KERNEL when switching to
non-KVM task's userspace.
- Forces allocating space for this state on all tasks, whether or not
they use KVM, and with likely zero users today and the near future.
- Complicates the FPU optimization thinking by including things that
can have no affect on userspace in the FPU

IMO the pros far outweigh the cons. 3x RDMSR and 3x WRMSR when loading host/guest
state is non-trivial overhead. That can be mitigated, e.g. by utilizing the
user return MSR framework, but it's still unpalatable. It's unlikely many guests
will SSS in the *near* future, but I don't want to end up with code that performs
poorly in the future and needs to be rewritten.

Especially because another big negative is that not utilizing XSTATE bleeds into
KVM's ABI. Userspace has to be told to manually save+restore MSRs instead of just
letting KVM_{G,S}ET_XSAVE handle the state. And that will create a bit of a
snafu if Linux does gain support for SSS.

On the other hand, the extra per-task memory is all of 24 bytes. AFAICT, there's
literally zero effect on guest XSTATE allocations because those are vmalloc'd and
thus rounded up to PAGE_SIZE, i.e. the next 4KiB. And XSTATE needs to be 64-byte
aligned, so the 24 bytes is only actually meaningful if the current size is within
24 bytes of the next cahce line. And the "current" size is variable depending on
which features are present and enabled, i.e. it's a roll of the dice as to whether
or not using XSTATE for supervisor CET would actually increase memory usage. And
_if_ it does increase memory consumption, I have a very hard time believing an
extra 64 bytes in the worst case scenario is a dealbreaker.

If the performance is a concern, i.e. we don't want to eat saving/restoring the
MSRs when switching to/from host FPU context, then I *think* that's simply a matter
of keeping guest state resident when loading non-guest FPU state.

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1015af1ae562..8e7599e3b923 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -167,6 +167,16 @@ void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask)
*/
xfd_update_state(fpstate);

+ /*
+ * Leave supervisor CET state as-is when loading host state
+ * (kernel or userspace). Supervisor CET state is managed via
+ * XSTATE for KVM guests, but the host never consumes said
+ * state (doesn't support supervisor shadow stacks), i.e. it's
+ * safe to keep guest state loaded into hardware.
+ */
+ if (!fpstate->is_guest)
+ mask &= ~XFEATURE_MASK_CET_KERNEL;
+
/*
* Restoring state always needs to modify all features
* which are in @mask even if the current task cannot use


So unless I'm missing something, NAK to this approach, at least not without trying
the kernel FPU approach, i.e. I want somelike like to PeterZ or tglx to actually
full on NAK the kernel approach before we consider shoving a hack into KVM.