Re: [PATCH 16/19] kvm: x86: Introduce KVM_{G|S}ET_XSAVE2 ioctl

From: Paolo Bonzini
Date: Mon Dec 13 2021 - 05:43:45 EST


On 12/13/21 11:10, Thomas Gleixner wrote:
On Fri, Dec 10 2021 at 17:30, Paolo Bonzini wrote:
On 12/8/21 01:03, Yang Zhong wrote:
+static int kvm_vcpu_ioctl_x86_set_xsave2(struct kvm_vcpu *vcpu, u8 *state)
+{
+ if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
+ return 0;
+
+ return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu, state,
+ supported_xcr0, &vcpu->arch.pkru);
+}
+

I think fpu_copy_uabi_to_guest_fpstate (and therefore
copy_uabi_from_kernel_to_xstate) needs to check that the size is
compatible with the components in the input.

fpu_copy_uabi_to_guest_fpstate() expects that the input buffer is
correctly sized. We surely can add a size check there.

fpu_copy_guest_fpstate_to_uabi is more problematic because that one
writes memory. For fpu_copy_uabi_to_guest_fpstate, we know the input
buffer size from the components and we can use it to do a properly-sized
memdup_user.

For fpu_copy_guest_fpstate_to_uabi we can just decide that KVM_GET_XSAVE
will only save up to the first 4K. Something like the following might
actually be good for 5.16-rc; right now, header.xfeatures might lead
userspace into reading uninitialized or unmapped memory:

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index d28829403ed0..69609b8c3887 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1138,8 +1138,10 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
struct xstate_header header;
unsigned int zerofrom;
u64 mask;
+ u64 size;
int i;
+ size = to->left;
memset(&header, 0, sizeof(header));
header.xfeatures = xsave->header.xfeatures;
@@ -1186,7 +1188,20 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
/* Copy xsave->i387.sw_reserved */
membuf_write(&to, xstate_fx_sw_bytes, sizeof(xsave->i387.sw_reserved));
- /* Copy the user space relevant state of @xsave->header */
+ /*
+ * Copy the user space relevant state of @xsave->header.
+ * If not all features fit in the buffer, drop them from the
+ * saved state so that userspace does not read uninitialized or
+ * unmapped memory.
+ */
+ mask = fpstate->user_xfeatures;
+ for_each_extended_xfeature(i, mask) {
+ if (xstate_offsets[i] + xstate_size[i] > size) {
+ header.xfeatures &= BIT(i) - 1;
+ mask &= BIT(i) - 1;
+ break;
+ }
+ }
membuf_write(&to, &header, sizeof(header));
zerofrom = offsetof(struct xregs_state, extended_state_area);
@@ -1197,7 +1212,6 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
* but there is no state to copy from in the compacted
* init_fpstate. The gap tracking will zero these states.
*/
- mask = fpstate->user_xfeatures;
for_each_extended_xfeature(i, mask) {
/*



Also, IIUC the size of the AMX state will vary in different processors.
Is this correct? If so, this should be handled already by
KVM_GET/SET_XSAVE2 and therefore should be part of the
arch/x86/kernel/fpu APIs. In the future we want to support migrating a
"small AMX" host to a "large AMX" host; and also migrating from a "large
AMX" host to a "small AMX" host if the guest CPUID is compatible with
the destination of the migration.

How is that supposed to work? If the AMX state size differs then the
hosts are not compatible.

I replied with some more questions later. Basically it depends on how
Intel will define palettes that aren't part of the first implementation
of AMX.

Paolo