Re: [PATCH v6 06/25] x86/fpu/xstate: Opt-in kernel dynamic bits when calculate guest xstate size

From: Yang, Weijiang
Date: Wed Oct 25 2023 - 10:50:37 EST


On 10/25/2023 1:07 AM, Sean Christopherson wrote:
On Fri, Sep 15, 2023, Weijiang Yang wrote:
On 9/15/2023 1:40 AM, Dave Hansen wrote:
On 9/13/23 23:33, Yang Weijiang wrote:
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1636,9 +1636,17 @@ static int __xstate_request_perm(u64 permitted, u64 requested, bool guest)
/* Calculate the resulting kernel state size */
mask = permitted | requested;
- /* Take supervisor states into account on the host */
+ /*
+ * Take supervisor states into account on the host. And add
+ * kernel dynamic xfeatures to guest since guest kernel may
+ * enable corresponding CPU feaures and the xstate registers
+ * need to be saved/restored properly.
+ */
if (!guest)
mask |= xfeatures_mask_supervisor();
+ else
+ mask |= fpu_kernel_dynamic_xfeatures;
This looks wrong. Per commit 781c64bfcb73 ("x86/fpu/xstate: Handle supervisor
states in XSTATE permissions"), mask at this point only contains user features,
which somewhat unintuitively doesn't include CET_USER (I get that they're MSRs
and thus supervisor state, it's just the name that's odd).

I think the user-only boundary becomes unclear when fpstate_reset() introduce below line:
fpu->perm.__state_perm          = fpu_kernel_cfg.default_features;

Then in xstate_request_perm(), it re-uses above reset value for __xstate_request_perm(),
so in the latter, the mask is already mixed with supervisor xfeatures.

IIUC, the "dynamic" features contains CET_KERNEL, whereas xfeatures_mask_supervisor()
conatins PASID, CET_USER, and CET_KERNEL. PASID isn't virtualized by KVM, but
doesn't that mean CET_USER will get dropped/lost if userspace requests AMX/XTILE
enabling?

Yes, __state_size is correct for guest enabled xfeatures, including CET_USER, and it gets
removed from __state_perm.

IIUC, from current qemu/kernel interaction for guest permission settings, __xstate_request_perm()
is called only _ONCE_ to set AMX/XTILE for every vCPU thread, so the removal of guest supervisor
xfeatures won't hurt guest! ;-/

The existing code also seems odd, but I might be missing something. Won't the
kernel drop PASID if the guest request AMX/XTILE?

Yeah, dropped after the first invocation.

I'm not at all familiar with
what PASID state is managed via XSAVE, so I've no idea if that's an actual problem
or just an oddity.

ksize = xstate_calculate_size(mask, compacted);
Heh, you changed the "guest" naming in "fpu_kernel_dynamic_xfeatures"
but didn't change the logic.

As it's coded at the moment *ALL* "fpu_kernel_dynamic_xfeatures" are
guest xfeatures. So, they're different in name only.
...

Would there ever be any reason for KVM to be on a system which supports a
dynamic kernel feature but where it doesn't get enabled for guest use, or
at least shouldn't have the FPU space allocated?
I haven't heard of that kind of usage for other features so far, CET
supervisor xstate is the only dynamic kernel feature now,  not sure whether
other CPU features having supervisor xstate would share the handling logic
like CET does one day.
There are definitely scenarios where CET will not be exposed to KVM guests, but
I don't see any reason to make the guest FPU space dynamically sized for CET.
It's what, 40 bytes?

Could it also be xsave/xrstor operation efficiency for non-guest threads?

I would much prefer to avoid the whole "dynamic" thing and instead make CET
explicitly guest-only. E.g. fpu_kernel_guest_only_xfeatures? Or even better
if it doesn't cause weirdness elsewhere, a dedicated fpu_guest_cfg. For me at
least, a fpu_guest_cfg would make it easier to understand what all is going on.

Agree,  guess non-kernel-generic designs are not very much welcome for kernel...