Re: [PATCH V2 02/12] KVM: x86: Allow the use of kvm_load_host_xsave_state() with guest_state_protected
From: Paolo Bonzini
Date: Thu Mar 06 2025 - 17:34:53 EST
Il gio 6 mar 2025, 21:44 Sean Christopherson <seanjc@xxxxxxxxxx> ha scritto:
> > Allowing the use of kvm_load_host_xsave_state() is really ugly, especially
> > since the corresponding code is so simple:
> >
> > if (cpu_feature_enabled(X86_FEATURE_PKU) && vcpu->arch.pkru != 0)
> > wrpkru(vcpu->arch.host_pkru);
>
> It's clearly not "so simple", because this code is buggy.
>
> The justification for using kvm_load_host_xsave_state() is that either KVM gets
> the TDX state model correct and the existing flows Just Work, or we handle all
> that state as one-offs and at best replicate concepts and flows, and at worst
> have bugs that are unique to TDX, e.g. because we get the "simple" code wrong,
> we miss flows that subtly consume state, etc.
A typo doesn't change the fact that kvm_load_host_xsave_state is
optimized with knowledge of the guest CR0 and CR4; faking the values
so that the same field means both "exit value" and "guest value", just
so that the common code does the right thing for pkru/xcr0/xss, is
unmaintainable and conceptually just wrong. And while the change for
XSS (and possibly other MSRs) is actually correct, it should be
justified for both SEV-ES/SNP and TDX rather than sneaked into the TDX
patches.
While there could be other flows that consume guest state, they're
just as bound to do the wrong thing if vcpu->arch is only guaranteed
to be somehow plausible (think anything that for whatever reason uses
cpu_role). There's no way the existing flows for
!guest_state_protected should run _at all_ when the register state is
not there. If they do, it's a bug and fixing them is the right thing
to do (it may feel like whack-a-mole but isn't). See for example the
KVM_CAP_SYNC_REGS calls that Adrian mentioned in the reply to 05/12,
and for which I'll send a patch too: I haven't checked if it's
possible, but I wonder if userspace could misuse sync_regs() to change
guest xcr0 and xss, and trick the TDX *host* into running with some
bits of xcr0 and xss unexpectedly cleared.
TDX provides well known values for the host for all three registers
that are being restored here, so there's no need to reuse code that is
meant to do something completely different. I'm not singling out TDX,
the same would be true for SEV-ES/SNP swap type C.
Paolo
Paolo