Re: [PATCH V2 02/12] KVM: x86: Allow the use of kvm_load_host_xsave_state() with guest_state_protected

From: Paolo Bonzini
Date: Mon Mar 10 2025 - 15:08:37 EST


On Sat, Mar 8, 2025 at 12:04 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Thu, Mar 06, 2025, Paolo Bonzini wrote:
> I still absolutely detest carrying dedicated code
> for SEV and TDX state management. It's bad enough that figuring out WTF actually
> happens basically requires encyclopedic knowledge of massive specs.
>
> I tried to figure out a way to share code, but everything I can come up with that
> doesn't fake vCPU state makes the non-TDX code a mess. :-(

The only thing worse is requiring encyclopedic knowledge of both the
specs and KVM. :) And yeah, we do require some knowledge of parts of
KVM
that *shouldn't* matter for protected-state guests, but it shouldn't
be worse than needed.

There's different microcode/firmware for VMX/SVM/SEV-ES+/TDX, the
chance of sharing code is lower and lower as more stuff is added
there---as is the case
for SEV-ES/SNP and TDX. Which is why state management code for TDX is
anyway doing its own thing most of the time---there's no point in
sharing a little bit which is not even the hardest.

> > just so that the common code does the right thing for pkru/xcr0/xss,
>
> FWIW, it's not just to that KVM does the right thing for those values, it's a
> defense in depth mechanism so that *when*, not if, KVM screws up, the odds of the
> bug being fatal to KVM and/or the guest are reduced.

I would say the other way round is true too. Not relying too much on
fake values in vcpu->arch can be more robust.

> Without actual sanity check and safeguards in the low level helpers, we absolutely
> are playing a game of whack-a-mole.
>
> E.g. see commit 9b42d1e8e4fe ("KVM: x86: Play nice with protected guests in
> complete_hypercall_exit()").
>
> At a glance, kvm_hv_hypercall() is still broken, because is_protmode() will return
> false incorrectly.

So the fixes are needed anyway and we're playing the game anyway. :(

> > And while the change for XSS (and possibly other MSRs) is actually correct,
> > it should be justified for both SEV-ES/SNP and TDX rather than sneaked into
> > the TDX patches.
> >
> > While there could be other flows that consume guest state, they're
> > just as bound to do the wrong thing if vcpu->arch is only guaranteed
> > to be somehow plausible (think anything that for whatever reason uses
> > cpu_role).
>
> But the MMU code is *already* broken. kvm_init_mmu() => vcpu_to_role_regs(). It
> "works" because the fubar role is never truly consumed. I'm sure there are more
> examples.

Yes, and there should be at least a WARN_ON_ONCE when it is accessed,
even if we don't completely cull the initialization of cpu_role...
Loading the XSAVE state isn't any different.

I'm okay with placing some values in cr0/cr4 or even xcr0/xss, but do
not wish to use them more than the absolute minimum necessary. And I
would rather not set more than the bare minimum needed in CR4... why
set CR4.PKE for example, if KVM anyway has no business using the guest
PKRU.

Paolo

> > There's no way the existing flows for !guest_state_protected should run _at
> > all_ when the register state is not there. If they do, it's a bug and fixing
> > them is the right thing to do (it may feel like whack-a-mole but isn't)
>
> Eh, it's still whack-a-mole, there just happen to be a finite number of moles :-)