I think it's simpler to always wait for #NM, it will only happen
once per vCPU. In other words, even if the guest clears XFD before
it generates #NM, the guest_fpu's XFD remains nonzero and an #NM
vmexit is possible. After #NM the guest_fpu's XFD is zero; then
passthrough can happen and the #NM vmexit trap can be disabled.
This will stop being at all optimal when Intel inevitably adds
another feature that uses XFD. In the potentially infinite window in
which the guest manages XFD and #NM on behalf of its userspace and
when the guest allocates the other hypothetical feature, all the #NMs
will have to be trapped by KVM.
Is it really worthwhile for KVM to use XFD at all instead of
preallocating the state and being done with it? KVM would still have
to avoid data loss if the guest sets XFD with non-init state, but #NM
could always pass through.