Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core

From: Paolo Bonzini
Date: Wed Oct 13 2021 - 08:37:26 EST


On 13/10/21 12:25, Liu, Jing2 wrote:
[...]
- the internal KVM value attached to guest_fpu. When #NM happens, this
one becomes zero.

The CPU value is:

- the guest_fpu value between kvm_load_guest_fpu and kvm_put_guest_fpu.
This ensures that no state is lost in the case you are describing.


OK, you mean using guest_fpu as a KVM value. Let me describe the
flow to see if anything missing.

When #NM trap which makes passthrough, guest_fpu XFD set to 0 and keeps
forever. (don't change HW XFD which is still 1)
In the #NM trap, KVM alloc buffer and regenerate a #NM exception to guest
to make guest kernel alloc its thread buffer.
Then in next vmexit, KVM sync vcpu->arch.xfd, load guest_fpu value (=0) and
update current->thread.fpu XFD to 0 for kernel reference.

In the #NM handler, KVM allocates the buffer and the guest_fpu XFD becomes zero. Also because the guest_fpu XFD is zero:

- #NM vmexits are disabled. More precisely, trapping #NM is only necessary if guest_fpu->xfd & ~vcpu->arch.xfd & vcpu->arch.xcr0 is nonzero (i.e. only if there is a state that is guest_fpu-disabled, but enabled according to both XFD and XCR0).

- On the next vmentry XFD is set to just vcpu->arch.xfd and the instruction is retried. If the instruction causes an #NM in the guest, it is not trapped and delivered normally to the guest.

I think it's simpler to always wait for #NM, it will only happen once
per vCPU. In other words, even if the guest clears XFD before it
generates #NM, the guest_fpu's XFD remains nonzero

You mean a wrmsr trap doesn't do anything and return back?

The guest might run with the same XFD value as before (which is guest_fpu->xfd | vcpu->arch.xfd), but vcpu->arch.xfd is changed. The value in vcpu->arch.xfd will be read back by an RDMSR, because passthrough is not enabled and the RDMSR will cause a vmexit.

Once an #NM is received and guest_fpu->xfd becomes zero, passthrough can be enabled.

Paolo

In this case, when next vmenter, the OR of the guest value
(vcpu->arch.xfd) and the guest_fpu value is still 1, so this
doesn't obey guest's HW assumption? (guest finds the wrmsr
didn't work)