Re: [PATCH] KVM: x86: Allow XSAVES on CPUs where host doesn't use it due to an errata

From: Maciej S. Szmigiero
Date: Tue Nov 28 2023 - 13:04:14 EST


On 28.11.2023 17:48, Sean Christopherson wrote:
On Mon, Nov 27, 2023, Maciej S. Szmigiero wrote:
On 27.11.2023 18:24, Sean Christopherson wrote:
On Thu, Nov 23, 2023, Maciej S. Szmigiero wrote:
From: "Maciej S. Szmigiero" <maciej.szmigiero@xxxxxxxxxx>

Since commit b0563468eeac ("x86/CPU/AMD: Disable XSAVES on AMD family 0x17")
kernel unconditionally clears the XSAVES CPU feature bit on Zen1/2 CPUs.

Since KVM CPU caps are initialized from the kernel boot CPU features this
makes the XSAVES feature also unavailable for KVM guests in this case, even
though they might want to decide on their own whether they are affected by
this errata.

Allow KVM guests to make such decision by setting the XSAVES KVM CPU
capability bit based on the actual CPU capability

This is not generally safe, as the guest can make such a decision if and only if
the Family/Model/Stepping information is reasonably accurate.

If one lies to the guest about the CPU it is running on then obviously
things may work non-optimally.

But this isn't about running optimally, it's about functional correctness. And
"lying" to the guest about F/M/S is extremely common.

This fixes booting Hyper-V enabled Windows Server 2016 VMs with more than
one vCPU on Zen1/2 CPUs.

How/why does lack of XSAVES break a multi-vCPU setup? Is Windows blindly doing
XSAVES based on FMS?

The hypercall from L2 Windows to L1 Hyper-V asking to boot the first AP
returns HV_STATUS_CPUID_XSAVE_FEATURE_VALIDATION_ERROR.

If it's just about CPUID enumeration, then userspace can simply stuff the XSAVES
feature flag. This is not something that belongs in KVM, because this is safe if
and only if F/M/S is accurate and the guest is actually aware of the erratum (or
will not actually use XSAVES for other reasons), neither of which KVM can guarantee.

In other words, your suggestion is that QEMU (or other VMM) not KVM
should be the one setting the XSAVES CPUID bit back, correct?

I don't think this would work with the current KVM code since it seems
to make various decisions depending on presence of XSAVES bit in KVM
caps rather than the guest CPUID and on boot_cpu_has(XSAVES) - one of
such code blocks was even modified by this patch.

It even says in the comment above that code that it is not possible to
actually disable XSAVES without disabling all other variants on SVM so
this has to be enabled if CPU supports it to switch the XSS MSR at
guest entry/exit (in this case it looks harmless since Zen1/2
supposedly don't support any supervisor extended states).

So it looks like we would need changes to *both* KVM and QEMU to
restore the XSAVES support this way.

Thanks,
Maciej