Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core

From: Paolo Bonzini
Date: Wed Oct 13 2021 - 04:43:01 EST


On 13/10/21 09:46, Liu, Jing2 wrote:

On 13/10/21 08:15, Liu, Jing2 wrote:
After KVM passthrough XFD to guest, when vmexit opening irq window and
KVM is interrupted, kernel softirq path can call
kernel_fpu_begin() to touch xsave state. This function does XSAVES. If
guest XFD[18] is 1, and with guest AMX state in register, then guest
AMX state is lost by XSAVES.

Yes, the host value of XFD (which is zero) has to be restored after vmexit.
See how KVM already handles SPEC_CTRL.

I'm trying to understand why qemu's XFD is zero after kernel supports AMX.

There are three copies of XFD:

- the guest value stored in vcpu->arch.

- the "QEMU" value attached to host_fpu. This one only becomes zero if QEMU requires AMX (which shouldn't happen).

- the internal KVM value attached to guest_fpu. When #NM happens, this one becomes zero.


The CPU value is:

- the host_fpu value before kvm_load_guest_fpu and after kvm_put_guest_fpu. This ensures that QEMU context switch is as cheap as possible.

- the guest_fpu value between kvm_load_guest_fpu and kvm_put_guest_fpu. This ensures that no state is lost in the case you are describing.

- the OR of the guest value and the guest_fpu value while the guest runs (using either MSR load/save lists, or manual wrmsr like pt_guest_enter/pt_guest_exit). This ensures that the host has the opportunity to get a #NM exception, and allocate AMX state in the guest_fpu and in current->thread.fpu.

Yes, passthrough is done by two cases: one is guest #NM trapped;
another is guest clearing XFD before it generates #NM (this is possible for
guest), then passthrough.
For the two cases, we passthrough and allocate buffer for guest_fpu, and
current->thread.fpu.

I think it's simpler to always wait for #NM, it will only happen once per vCPU. In other words, even if the guest clears XFD before it generates #NM, the guest_fpu's XFD remains nonzero and an #NM vmexit is possible. After #NM the guest_fpu's XFD is zero; then passthrough can happen and the #NM vmexit trap can be disabled.

Paolo