Re: [PATCH v6 06/25] x86/fpu/xstate: Opt-in kernel dynamic bits when calculate guest xstate size

From: Yang, Weijiang
Date: Tue Nov 14 2023 - 04:13:53 EST


On 11/8/2023 2:04 AM, Maxim Levitsky wrote:
On Fri, 2023-11-03 at 07:33 -0700, Sean Christopherson wrote:
On Thu, Nov 02, 2023, Maxim Levitsky wrote:
On Wed, 2023-11-01 at 07:16 -0700, Sean Christopherson wrote:
On Tue, Oct 31, 2023, Maxim Levitsky wrote:
On Thu, 2023-10-26 at 10:24 -0700, Sean Christopherson wrote:
--
From: Sean Christopherson <seanjc@xxxxxxxxxx>
Date: Thu, 26 Oct 2023 10:17:33 -0700
Subject: [PATCH] x86/fpu/xstate: Always preserve non-user xfeatures/flags in
__state_perm

Fixes: 781c64bfcb73 ("x86/fpu/xstate: Handle supervisor states in XSTATE permissions")
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
arch/x86/kernel/fpu/xstate.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index ef6906107c54..73f6bc00d178 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1601,16 +1601,20 @@ static int __xstate_request_perm(u64 permitted, u64 requested, bool guest)
if ((permitted & requested) == requested)
return 0;
- /* Calculate the resulting kernel state size */
+ /*
+ * Calculate the resulting kernel state size. Note, @permitted also
+ * contains supervisor xfeatures even though supervisor are always
+ * permitted for kernel and guest FPUs, and never permitted for user
+ * FPUs.
+ */
mask = permitted | requested;
- /* Take supervisor states into account on the host */
- if (!guest)
- mask |= xfeatures_mask_supervisor();
ksize = xstate_calculate_size(mask, compacted);
This might not work with kernel dynamic features, because
xfeatures_mask_supervisor() will return all supported supervisor features.
I don't understand what you mean by "This".
Somewhat of a side topic, I feel very strongly that we should use "guest only"
terminology instead of "dynamic". There is nothing dynamic about whether or not
XFEATURE_CET_KERNEL is allowed; there's not even a real "decision" beyond checking
wheter or not CET is supported.
Therefore at least until we have an actual kernel dynamic feature (a feature
used by the host kernel and not KVM, and which has to be dynamic like AMX),
I suggest that KVM stops using the permission API completely for the guest
FPU state, and just gives all the features it wants to enable right to
By "it", I assume you mean userspace?

__fpu_alloc_init_guest_fpstate() (Guest FPU permission API IMHO should be
deprecated and ignored)
KVM allocates guest FPU state during KVM_CREATE_VCPU, so not using prctl() would
either require KVM to defer allocating guest FPU state until KVM_SET_CPUID{,2},
or would require a VM-scoped KVM ioctl() to let userspace opt-in to

Allocating guest FPU state during KVM_SET_CPUID{,2} would get messy,
as KVM allows
multiple calls to KVM_SET_CPUID{,2} so long as the vCPU hasn't done KVM_RUN. E.g.
KVM would need to support actually resizing guest FPU state, which would be extra
complexity without any meaningful benefit.
OK, I understand you now. What you claim is that it is legal to do this:

- KVM_SET_XSAVE
- KVM_SET_CPUID (with AMX enabled)

KVM_SET_CPUID will have to resize the xstate which is already valid.
I was actually talking about

KVM_SET_CPUID2 (with dynamic user feature #1)
KVM_SET_CPUID2 (with dynamic user feature #2)

The second call through __xstate_request_perm() will be done with only user
xfeatures in @permitted and so the kernel will compute the wrong ksize.

Your patch to fix the __xstate_request_perm() does seem to be correct in a
sense that it will preserve the kernel fpu components in the fpu permissions.

However note that kernel fpu permissions come from
'fpu_kernel_cfg.default_features' which don't include the dynamic kernel
xfeatures (added a few patches before this one).
CET_KERNEL isn't dynamic! It's guest-only. There are no runtime decisions as to
whether or not CET_KERNEL is allowed. All guest FPU get CET_KERNEL, no kernel FPUs
get CET_KERNEL.

That matters because I am also proposing that we add a dedicated, defined-at-boot
fpu_guest_cfg instead of bolting on a "dynamic", which is what I meant by this:
Seems fair.

: Or even better if it doesn't cause weirdness elsewhere, a dedicated
: fpu_guest_cfg. For me at least, a fpu_guest_cfg would make it easier to
: understand what all is going on.
This is a very good idea.

That way, initialization of permissions is simply

fpu->guest_perm = fpu_guest_cfg.default_features;

and there's no need to differentiate between guest and kernel FPUs when reallocating
for dynamic user xfeatures because guest_perm.__state_perm already holds the correct
data.

Therefore an attempt to resize the xstate to include a kernel dynamic feature by
__xfd_enable_feature will fail.

If kvm on the other hand includes all the kernel dynamic features in the
initial allocation of FPU state (not optimal but possible),
This is what I am suggesting.
This is a valid solution.

Sorry for delayed response!!

I favor adding new fpu_guest_cfg to make things clearer.
Maybe you're talking about some patch like below: (not tested)

From 19c77aad196efe7eab4a10c5882166453de287b9 Mon Sep 17 00:00:00 2001
From: Yang Weijiang <weijiang.yang@xxxxxxxxx>
Date: Fri, 22 Sep 2023 00:37:20 -0400
Subject: [PATCH] x86/fpu/xstate: Introduce fpu_guest_cfg for guest FPU
 configuration

Signed-off-by: Yang Weijiang <weijiang.yang@xxxxxxxxx>
---
 arch/x86/include/asm/fpu/types.h |  2 +-
 arch/x86/kernel/fpu/core.c       | 14 +++++++++++---
 arch/x86/kernel/fpu/xstate.c     |  9 +++++++++
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index c6fd13a17205..306825ad6bc0 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -602,6 +602,6 @@ struct fpu_state_config {
 };

 /* FPU state configuration information */
-extern struct fpu_state_config fpu_kernel_cfg, fpu_user_cfg;
+extern struct fpu_state_config fpu_kernel_cfg, fpu_user_cfg, fpu_guest_cfg;

 #endif /* _ASM_X86_FPU_H */
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index a86d37052a64..c70dad9894f0 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -33,9 +33,10 @@ DEFINE_STATIC_KEY_FALSE(__fpu_state_size_dynamic);
 DEFINE_PER_CPU(u64, xfd_state);
 #endif
-/* The FPU state configuration data for kernel and user space */
+/* The FPU state configuration data for kernel, user space and guest. */
 struct fpu_state_config        fpu_kernel_cfg __ro_after_init;
 struct fpu_state_config fpu_user_cfg __ro_after_init;
+struct fpu_state_config fpu_guest_cfg __ro_after_init;

 /*
  * Represents the initial FPU state. It's mostly (but not completely) zeroes,
@@ -535,8 +536,15 @@ void fpstate_reset(struct fpu *fpu)
        fpu->perm.__state_perm          = fpu_kernel_cfg.default_features;
        fpu->perm.__state_size          = fpu_kernel_cfg.default_size;
        fpu->perm.__user_state_size     = fpu_user_cfg.default_size;
-       /* Same defaults for guests */
-       fpu->guest_perm = fpu->perm;
+
+       /* Guest permission settings */
+       fpu->guest_perm.__state_perm    = fpu_guest_cfg.default_features;
+       fpu->guest_perm.__state_size    = fpu_guest_cfg.default_size;
+       /*
+        * Set guest's __user_state_size to fpu_user_cfg.default_size so that
+        * existing uAPIs can still work.
+        */
+       fpu->guest_perm.__user_state_size = fpu_user_cfg.default_size;
 }

 static inline void fpu_inherit_perms(struct fpu *dst_fpu)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1b7bc03968c5..bebabace628b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -681,6 +681,7 @@ static int __init init_xstate_size(void)
 {
        /* Recompute the context size for enabled features: */
        unsigned int user_size, kernel_size, kernel_default_size;
+       unsigned int guest_default_size;
        bool compacted = cpu_feature_enabled(X86_FEATURE_XCOMPACTED);

        /* Uncompacted user space size */
@@ -702,13 +703,18 @@ static int __init init_xstate_size(void)
        kernel_default_size =
xstate_calculate_size(fpu_kernel_cfg.default_features, compacted);

+       guest_default_size =
+ xstate_calculate_size(fpu_guest_cfg.default_features, compacted);
+
        if (!paranoid_xstate_size_valid(kernel_size))
                return -EINVAL;

        fpu_kernel_cfg.max_size = kernel_size;
        fpu_user_cfg.max_size = user_size;
+       fpu_guest_cfg.max_size = kernel_size;

        fpu_kernel_cfg.default_size = kernel_default_size;
+       fpu_guest_cfg.default_size = guest_default_size;
        fpu_user_cfg.default_size =
                xstate_calculate_size(fpu_user_cfg.default_features, false);

@@ -829,6 +835,9 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
        fpu_user_cfg.default_features = fpu_user_cfg.max_features;
        fpu_user_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;

+       fpu_guest_cfg.max_features = fpu_kernel_cfg.max_features;
+       fpu_guest_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;
+
        /* Store it for paranoia check at the end */
        xfeatures = fpu_kernel_cfg.max_features;

--
2.27.0

[...]