Re: [PATCH] kvm: x86: rewrite kvm_spec_ctrl_valid_bits

From: Maxim Levitsky
Date: Sun Jul 05 2020 - 05:41:06 EST


On Thu, 2020-07-02 at 11:16 -0700, Sean Christopherson wrote:
> On Thu, Jul 02, 2020 at 08:44:55PM +0300, Maxim Levitsky wrote:
> > There are few cases when this function was creating a bogus #GP condition,
> > for example case when and AMD host supports STIBP but doesn't support SSBD.
> >
> > Follow the rules for AMD and Intel strictly instead.
>
> Can you elaborate on the conditions that are problematic, e.g. what does
> the guest expect to exist that KVM isn't providing?

Hi Sean Christopherson!
Sorry that I haven't explained the issue here.
I explained it in bugzilla I opened in details and forgot to explain it
in the commit message.
https://bugzilla.redhat.com/show_bug.cgi?id=1853447


The issue is that on my cpu (3970X), it does not support IBRS,
but it does support STIBP, and thus guest gets the STIBP cpuid bits enabled
(both the amd one and KVM also enables the intel's cpuid bit for this feature).

Then, when guest actually starts to use STIBP, it gets #GP because both of these conditions
potentially don't allow STIBP bit to be set when IBRS is not supported:

if (!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) &&
!guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS))
bits &= ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP);
if (!boot_cpu_has(X86_FEATURE_SPEC_CTRL) &&
!boot_cpu_has(X86_FEATURE_AMD_IBRS))
bits &= ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP);

Most likely it fails on the second condition, since X86_FEATURE_SPEC_CTRL
is enabled in the guest because host does support IBPB which X86_FEATURE_SPEC_CTRL also cover.

But the host doesn't support X86_FEATURE_SPEC_CTRL and it doesn't support X86_FEATURE_AMD_IBRS
thus second condition clears the SPEC_CTRL_STIBP wrongly.

Now in addition to that, I long ago had known that win10 guests on my machine started to
crash when qemu added ability to pass through X86_FEATURE_AMD_IBRS.
I haven't paid much attention to that other than bisecting this and adding '-amd-stibp' to my cpu flags.

I did notice that I am not the only one to have that issue, for example
https://www.reddit.com/r/VFIO/comments/gf53o8/upgrading_to_qemu_5_broke_my_setup_windows_bsods/
https://forum.level1techs.com/t/amd-fix-for-host-passthrough-on-qemu-5-0-0-kernel-5-6/157710

Now after I debugged this issue in Linux, it occured to me that this might be the same issue as in Windows,
and indeed it is. The only difference is that Windows doesn't start to play with STIBP when Intel
specific cpuid bit is set on AMD machine (which KVM sets for long time) but starts to enable it when AMD specific
bit is set, that is X86_FEATURE_AMD_IBRS, the bit that qemu recently started to set and it gets the same #GP and crashes.

>From findings on my machine, if we cross-reference this with the above posts, I can assume that many Ryzens have this configuration
of no support for IBRS but support STIBP.
In fact I don't see the kernel use IBRS much (it seem only used around firmware calls or so), so it makes sense
that AMD chose to not enable it.

For the fix itself,
I can fix this by only changing the above condition, but then I read the AMD whitepaper on
this and they mention that bits in IA32_SPEC_CTRL don't #GP even if not supported,
and to implement this correctly would be too complicated with current logic,
thus I rewrote the logic to be as simple as possible and as close to the official docs as possible
as well.


>
> > AMD #GP rules for IA32_SPEC_CTRL can be found here:
> > https://bugzilla.kernel.org/show_bug.cgi?id=199889
> >
> > Fixes: 6441fa6178f5 ("KVM: x86: avoid incorrect writes to host MSR_IA32_SPEC_CTRL")
> >
> > Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
> > ---
> > arch/x86/kvm/x86.c | 57 ++++++++++++++++++++++++++++++++++------------
> > 1 file changed, 42 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 00c88c2f34e4..a6bed4670b7f 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -10670,27 +10670,54 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
> > }
> > EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
> >
> > -u64 kvm_spec_ctrl_valid_bits(struct kvm_vcpu *vcpu)
> > +
> > +static u64 kvm_spec_ctrl_valid_bits_host(void)
> > +{
> > + uint64_t bits = 0;
> > +
> > + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
> > + bits |= SPEC_CTRL_IBRS;
> > + if (boot_cpu_has(X86_FEATURE_INTEL_STIBP))
> > + bits |= SPEC_CTRL_STIBP;
> > + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD))
> > + bits |= SPEC_CTRL_SSBD;
> > +
> > + if (boot_cpu_has(X86_FEATURE_AMD_IBRS) || boot_cpu_has(X86_FEATURE_AMD_STIBP))
> > + bits |= SPEC_CTRL_STIBP | SPEC_CTRL_IBRS;
> > +
> > + if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
> > + bits |= SPEC_CTRL_STIBP | SPEC_CTRL_IBRS | SPEC_CTRL_SSBD;
> > +
> > + return bits;
> > +}
>
> Rather than compute the mask every time, it can be computed once on module
> load and stashed in a global. Note, there's a RFC series[*] to support
> reprobing bugs at runtime, but that has bigger issues with existing KVM
> functionality to be addressed, i.e. it's not our problem, yet :-).
>
> [*] https://lkml.kernel.org/r/1593703107-8852-1-git-send-email-mihai.carabas@xxxxxxxxxx

Thanks for the pointer!

Note though that the above code only runs once, since after a single successful (non #GP) set
of it to non-zero value, it is cleared in MSR bitmap for both reads and writes on
both VMX and SVM.
This is done because of performance reasons which in this case are more important than absolute correctness.
Thus to some extent the guest checks in the above are pointless.

If you ask
me, I would just remove the kvm_spec_ctrl_valid_bits, and pass this msr to guest
right away and not on first access.

I talked with Paulo about this and his opinion if I understand correctly is that the
above is
a best effort correctness wise since at least we emulate the bits correctly on first access.

>
> > +
> > +static u64 kvm_spec_ctrl_valid_bits_guest(struct kvm_vcpu *vcpu)
> > {
> > - uint64_t bits = SPEC_CTRL_IBRS | SPEC_CTRL_STIBP | SPEC_CTRL_SSBD;
> > + uint64_t bits = 0;
> >
> > - /* The STIBP bit doesn't fault even if it's not advertised */
> > - if (!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) &&
> > - !guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS))
> > - bits &= ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP);
> > - if (!boot_cpu_has(X86_FEATURE_SPEC_CTRL) &&
> > - !boot_cpu_has(X86_FEATURE_AMD_IBRS))
> > - bits &= ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP);
> > + if (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
> > + bits |= SPEC_CTRL_IBRS;
> > + if (guest_cpuid_has(vcpu, X86_FEATURE_INTEL_STIBP))
> > + bits |= SPEC_CTRL_STIBP;
> > + if (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL_SSBD))
> > + bits |= SPEC_CTRL_SSBD;
> >
> > - if (!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL_SSBD) &&
> > - !guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD))
> > - bits &= ~SPEC_CTRL_SSBD;
> > - if (!boot_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD) &&
> > - !boot_cpu_has(X86_FEATURE_AMD_SSBD))
> > - bits &= ~SPEC_CTRL_SSBD;
> > + if (guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS) ||
> > + guest_cpuid_has(vcpu, X86_FEATURE_AMD_STIBP))
>
> Bad indentation.
True.

>
> > + bits |= SPEC_CTRL_STIBP | SPEC_CTRL_IBRS;
> > + if (guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD))
> > + bits |= SPEC_CTRL_STIBP | SPEC_CTRL_IBRS | SPEC_CTRL_SSBD;
>
> Would it be feasible to split into two patches? The first (tagged Fixes:)
> to make the functional changes without inverting the logic or splitting, and
> then do the cleanup? It's really hard to review this patch because I can't
> easily tease out what's different in terms of functionality.

The new logic follows (hopefully) Intel's spec and AMD spec.
I will try to split it though.



>
> > return bits;
> > }
> > +
> > +u64 kvm_spec_ctrl_valid_bits(struct kvm_vcpu *vcpu)
> > +{
> > + return kvm_spec_ctrl_valid_bits_host() &
> > + kvm_spec_ctrl_valid_bits_guest(vcpu);
> > +}
> > +
> > +
> > EXPORT_SYMBOL_GPL(kvm_spec_ctrl_valid_bits);
> >
> > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
> > --
> > 2.25.4
> >

Thanks for the review,
Best regards,
Maxim Levitsky