Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core

From: Paolo Bonzini
Date: Thu Oct 14 2021 - 11:02:42 EST


On 14/10/21 16:09, Thomas Gleixner wrote:
On Thu, Oct 14 2021 at 11:01, Paolo Bonzini wrote:
On 14/10/21 10:02, Liu, Jing2 wrote:
Based on the input from Andy and Thomas, the new way would be like this:

1) host_fpu must always be checked for reallocation in
kvm_load_guest_fpu (or in the FPU functions that it calls, that depends
on the rest of Thomas's patches). That's because arch_prctl can enable
AMX for QEMU at any point after KVM_CREATE_VCPU.

No.

1) QEMU starts
2) QEMU requests permissions via prctl()
3) QEMU creates vCPU threads

Doing it the other way around makes no sense at all and wont work.

Sure, but KVM needs to do something that makes sense even for userspaces that are not QEMU.

For example, there could be a program that uses AMX *itself* and does not expose it to the guest. In that case, the arch_prctl can come at the point AMX is needed, which can be after the program creates vCPU threads. That's for host_fpu.

For the guest_fpu, I agree that the arch_prctl must come before creating vCPUs.

2) every use of vcpu->arch.guest_supported_xcr0 is changed to only
include those dynamic-feature bits that were enabled via arch_prctl.
That is, something like:

static u64 kvm_guest_supported_cr0(struct kvm_vcpu *vcpu)
{
return vcpu->arch.guest_supported_xcr0 &
(~xfeatures_mask_user_dynamic | \
current->thread.fpu.dynamic_state_perm);

Bah. You can't get enough from poking in internals, right?

vcpu_create()

fpu_init_fpstate_user(guest_fpu, supported_xcr0)

That will (it does not today) do:

guest_fpu::__state_perm = supported_xcr0 & xstate_get_group_perm();

The you have the information you need right in the guest FPU.

Good, I wasn't aware of the APIs that will be there.

int kvm_alloc_fpu_dynamic_features(struct kvm_vcpu *vcpu)
{
u64 allowed_dynamic = current->thread.fpu.dynamic_state_perm;

That's not a valid assumption.

u64 enabled_dynamic =
vcpu->arch.xcr0 & xfeatures_mask_user_dynamic;

/* All dynamic features have to be arch_prctl'd first. */
WARN_ON_ONCE(enabled_dynamic & ~allowed_dynamic);

if (!vcpu->arch.xfd_passthrough) {
/* All dynamic states will #NM? Wait and see. */
if ((enabled_dynamic & ~vcpu->arch.xfd) == 0)
return 0;

kvm_x86_ops.enable_xfd_passthrough(vcpu);
}

/* current->thread.fpu was already handled by arch_prctl. */

No. current->thread.fpu has the default buffer unless QEMU used AMX or
something forced it to allocate a larger buffer.

return fpu_alloc_features(vcpu->guest_fpu,
vcpu->guest_fpu.dynamic_state_perm | enabled_dynamic);

This unconditionally calls into that allocation for every XCR0/XFD
trap ?

Calls into the function, but doesn't necessarily allocate anything. What you wrote below looks correct to me, thanks.

Paolo

Also you really should not wait until _all_ dynamic states are cleared
in guest XFD. Because a guest which has bit 18 and 19 available but only > uses one of them is going to trap on every other context switch due to
XFD writes.