Re: [PATCH] KVM: x86: Allow XSAVES on CPUs where host doesn't use it due to an errata
From: Sean Christopherson
Date: Tue Nov 28 2023 - 18:42:48 EST
On Tue, Nov 28, 2023, Maciej S. Szmigiero wrote:
> On 28.11.2023 17:48, Sean Christopherson wrote:
> > On Mon, Nov 27, 2023, Maciej S. Szmigiero wrote:
> > > On 27.11.2023 18:24, Sean Christopherson wrote:
> > > > On Thu, Nov 23, 2023, Maciej S. Szmigiero wrote:
> > > > > From: "Maciej S. Szmigiero" <maciej.szmigiero@xxxxxxxxxx>
> > > > >
> > > > > Since commit b0563468eeac ("x86/CPU/AMD: Disable XSAVES on AMD family 0x17")
> > > > > kernel unconditionally clears the XSAVES CPU feature bit on Zen1/2 CPUs.
> > > > >
> > > > > Since KVM CPU caps are initialized from the kernel boot CPU features this
> > > > > makes the XSAVES feature also unavailable for KVM guests in this case, even
> > > > > though they might want to decide on their own whether they are affected by
> > > > > this errata.
> > > > >
> > > > > Allow KVM guests to make such decision by setting the XSAVES KVM CPU
> > > > > capability bit based on the actual CPU capability
> > > >
> > > > This is not generally safe, as the guest can make such a decision if and only if
> > > > the Family/Model/Stepping information is reasonably accurate.
> > >
> > > If one lies to the guest about the CPU it is running on then obviously
> > > things may work non-optimally.
> >
> > But this isn't about running optimally, it's about functional correctness. And
> > "lying" to the guest about F/M/S is extremely common.
> >
> > > > > This fixes booting Hyper-V enabled Windows Server 2016 VMs with more than
> > > > > one vCPU on Zen1/2 CPUs.
> > > >
> > > > How/why does lack of XSAVES break a multi-vCPU setup? Is Windows blindly doing
> > > > XSAVES based on FMS?
> > >
> > > The hypercall from L2 Windows to L1 Hyper-V asking to boot the first AP
> > > returns HV_STATUS_CPUID_XSAVE_FEATURE_VALIDATION_ERROR.
> >
> > If it's just about CPUID enumeration, then userspace can simply stuff the XSAVES
> > feature flag. This is not something that belongs in KVM, because this is safe if
> > and only if F/M/S is accurate and the guest is actually aware of the erratum (or
> > will not actually use XSAVES for other reasons), neither of which KVM can guarantee.
>
> In other words, your suggestion is that QEMU (or other VMM) not KVM
> should be the one setting the XSAVES CPUID bit back, correct?
>
> I don't think this would work with the current KVM code since it seems
> to make various decisions depending on presence of XSAVES bit in KVM
> caps rather than the guest CPUID and on boot_cpu_has(XSAVES) - one of
> such code blocks was even modified by this patch.
>
> It even says in the comment above that code that it is not possible to
> actually disable XSAVES without disabling all other variants on SVM so
> this has to be enabled if CPU supports it to switch the XSS MSR at
> guest entry/exit (in this case it looks harmless since Zen1/2
> supposedly don't support any supervisor extended states).
>
> So it looks like we would need changes to *both* KVM and QEMU to
> restore the XSAVES support this way.
I'm not suggesting we restore XSAVES support, I'm suggesting that _if_ someone
wants to hack their setup to let the guest use broken hardware, then they should
do that in userspace or in an a private kernel, not in upstream KVM.