Re: [PATCH 1/2] KVM: x86: relax canonical checks for some x86 architectural msrs

From: mlevitsk
Date: Fri Jul 26 2024 - 11:09:28 EST


У пт, 2024-07-26 у 14:50 +0800, Chao Gao пише:
> On Thu, Jul 25, 2024 at 11:01:09AM -0400, Maxim Levitsky wrote:
> > Several architectural msrs (e.g MSR_KERNEL_GS_BASE) must contain
> > a canonical address, and according to Intel PRM, this is enforced
> > by #GP on a MSR write.
> >
> > However with the introduction of the LA57 the definition of
> > what is a canonical address became blurred.
> >
> > Few tests done on Sapphire Rapids CPU and on Zen4 CPU,
> > reveal:
> >
> > 1. These CPUs do allow full 57-bit wide non canonical values
> > to be written to MSR_GS_BASE, MSR_FS_BASE, MSR_KERNEL_GS_BASE,
> > regardless of the state of CR4.LA57.
> > Zen4 in addition to that even allows such writes to
> > MSR_CSTAR and MSR_LSTAR.
>
> This actually is documented/implied at least in ISE [1]. In Chapter 6.4
> "CANONICALITY CHECKING FOR DATA ADDRESSES WRITTEN TO CONTROL REGISTERS AND
> MSRS"
>
>   In Processors that support LAM continue to require the addresses written to
>   control registers or MSRs to be 57-bit canonical if the processor _supports_
>   5-level paging or 48-bit canonical if it supports only 4-level paging
>
> [1]: https://cdrdv2.intel.com/v1/dl/getContent/671368

I haven't found this in the actual PRM, but mine is relatively old,
(from September 2023, I didn't bother to update it because 5 level paging is
quite an old feature)

>
> >
> > 2. These CPUs don't prevent the user from switching back to 4 level
> > paging with values that will be non canonical in 4 level paging,
> > and instead just allow the msrs to contain these values.
> >
> > Since these MSRS are all passed through to the guest, and microcode
> > allows the non canonical values to get into these msrs,
> > KVM has to tolerate such values and avoid crashing the guest.
> >
> > To do so, always allow the host initiated values regardless of
> > the state of CR4.LA57, instead only gate this by the actual hardware
> > support for 5 level paging.
> >
> > To be on the safe side leave the check for guest writes as is.
> >
> > Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
> > ---
> > arch/x86/kvm/x86.c | 31 ++++++++++++++++++++++++++++++-
> > 1 file changed, 30 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index a6968eadd418..c599deff916e 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1844,7 +1844,36 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
> >         case MSR_KERNEL_GS_BASE:
> >         case MSR_CSTAR:
> >         case MSR_LSTAR:
> > -               if (is_noncanonical_address(data, vcpu))
> > +
> > +               /*
> > +                * Both AMD and Intel cpus tend to allow values which
> > +                * are canonical in the 5 level paging mode but are not
> > +                * canonical in the 4 level paging mode to be written
> > +                * to the above msrs, regardless of the state of the CR4.LA57.
> > +                *
> > +                * Intel CPUs do honour CR4.LA57 for the MSR_CSTAR/MSR_LSTAR,
> > +                * AMD cpus don't even do that.
> > +                *
> > +                * Both CPUs also allow non canonical values to remain in
> > +                * these MSRs if the CPU was in 5 level paging mode and was
> > +                * switched back to 4 level paging, and tolerate these values
> > +                * both in native MSRs and in vmcs/vmcb fields.
> > +                *
> > +                * To avoid crashing a guest, which manages using one of the above
> > +                * tricks to get non canonical value to one of
> > +                * these MSRs, and later migrates, allow the host initiated
> > +                * writes regardless of the state of CR4.LA57.
> > +                *
> > +                * To be on the safe side, don't allow the guest initiated
> > +                * writes to bypass the canonical check (e.g be more strict
> > +                * than what the actual ucode usually does).
>
> I may think guest-initiated writes should be allowed as well because this is
> the architectural behavior.

Note though that for MSR_CSTAR/MSR_LSTAR I did set #GP, depending on CR.LA57.
Ah, I see it, KVM intercepts these msrs on VMX (but not on SVM) and I was under 
the impression that it doesn't, that is why I get #GP depending on CR4.LA57....

I do wonder why we intercept these msrs on VMX and not on SVM.

It all makes sense now, thanks a lot for the explanation!

>
> > +                */
> > +
> > +               if (!host_initiated && is_noncanonical_address(data, vcpu))
> > +                       return 1;
> > +
> > +               if (!__is_canonical_address(data,
> > +                       boot_cpu_has(X86_FEATURE_LA57) ? 57 : 48))
>
> boot_cpu_has(X86_FEATURE_LA57)=1 means LA57 is enabled. Right?
>
> With this change, host-initiated writes must be 48-bit canonical if LA57 isn't
> enabled on the host, even if it is enabled in the guest. (note that KVM can
> expose LA57 to guests even if LA57 is disabled on the host, see
> kvm_set_cpu_caps()).


Sorry about this - we indeed need to use kvm_cpu_cap_has(X86_FEATURE_LA57) because
it is forced based on host raw CPUID.

I remember I wanted to do exactly this but forgot somehow.

Also I need to update this for nested VMX - these msrs are also checked there based
on CR4.LA57.

Thanks for the clarification, and it all makes sense. I'll send v2 with all of this
when I get back from a vacation (next Wednesday).


Best regards,
Maxim Levitsky


>
> >                         return 1;
>