Re: [PATCH v2 1/2] kvm/vmx: avoid CPUID faulting leaking to guest

From: Sean Christopherson
Date: Tue Mar 19 2019 - 10:18:40 EST


On Tue, Mar 19, 2019 at 12:26:55PM +0800, Xiaoyao Li wrote:
> Hi, Sean,
>
> On Mon, 2019-03-18 at 09:23 -0700, Sean Christopherson wrote:
> > On Mon, Mar 18, 2019 at 07:43:23PM +0800, Xiaoyao Li wrote:
> > > cpuid faulting is a feature about CPUID instruction. When cpuif faulting
> >
> > ^^^^^
> > cpuid
> > > is enabled, all execution of the CPUID instruction outside system-management
> > > mode (SMM) cause a general-protection (#GP) if the CPL > 0.
> > >
> > > About this feature, detailed information can be found at
> > >
> https://www.intel.com/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf
> > >
> > > Current KVM provides software emulation of this feature for guest.
> > > However, because cpuid faulting takes higher priority over CPUID vm exit
> > > (Intel
> > > SDM vol3.25.1.1), there is a risk of leaking cpuid faulting to guest when
> > > host
> > > enables it. If host enables cpuid faulting by setting the bit 0 of
> > > MSR_MISC_FEATURES_ENABLES, it will pass to guest since there is no handling
> > > of
> > > MSR_MISC_FEATURES_ENABLES yet. As a result, when guest calls CPUID
> > > instruction
> > > in CPL > 0, it will generate a #GP instead of CPUID vm eixt.
> > >
> > > This issue will cause guest boot failure when guest uses *modprobe*
> > > to load modules. *modprobe* calls CPUID instruction, thus causing #GP in
> > > guest. Since there is no handling of cpuid faulting in #GP handler, guest
> > > fails boot.
> > >
> > > To fix this issue, we should switch cpuid faulting bit between host and
> > > guest.
> > > Since MSR_MISC_FEATURES_ENABLES is intel-specific, this patch implement the
> > > switching only in vmx. It clears the cpuid faulting bit and save host's
> > > value before switching to guest, and restores the cpuid faulting settings of
> > > host before switching to host.
> > >
> > > Because kvm provides the software emulation of cpuid faulting, we can
> > > just clear the cpuid faulting bit in hardware MSR when switching to
> > > guest.
> > >
> > > Signed-off-by: Xiaoyao Li <xiaoyao.li@xxxxxxxxxxxxxxx>
> > > ---
> > > Changes in v2:
> > > - move the save/restore of cpuid faulting bit to
> > > vmx_prepare_swich_to_guest/vmx_prepare_swich_to_host to avoid every
> > > vmentry RDMSR, based on Paolo's comment.
> > >
> > > ---
> > > arch/x86/kvm/vmx/vmx.c | 34 ++++++++++++++++++++++++++++++++++
> > > arch/x86/kvm/vmx/vmx.h | 2 ++
> > > 2 files changed, 36 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 541d442edd4e..2c59e0209e36 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -1035,6 +1035,23 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
> > > wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
> > > }
> > >
> > > +static void vmx_save_host_cpuid_fault(struct vcpu_vmx *vmx)
> >
> > Maybe vmx_load_guest_misc_features_enables()? See below for reasoning.
> > Alternatively you can drop the helpers altogether.
> >
> > > +{
> > > + u64 host_val;
> > > +
> > > + if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
> > > + return;
> > > +
> > > + rdmsrl(MSR_MISC_FEATURES_ENABLES, host_val);
> > > + vmx->host_msr_misc_features_enables = host_val;
> >
> > There's no need to cache the host value in struct vcpu_vmx, just use the
> > kernel's shadow value.
>
> you're right, thanks for pointing out this.
>
> > > +
> > > + /* clear cpuid fault bit to avoid it leak to guest */
> >
> > Personally I find the comment unnecessary and somewhat misleading.
> >
> > > + if (host_val & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> > > + wrmsrl(MSR_MISC_FEATURES_ENABLES,
> > > + host_val & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
> >
> > I think we can also skip WRMSR if CPUID faulting is also enabled in the
>
> Right, the initial purpose of this patch is to clear hardware cpuid faulting bit
> for guest and just let guest use the kvm emulation of cpuid faulting.
> Using hardware cpuid faulting instead of kvm emulation is what the next patch
> doing.
>
> It's Paolo's suggestion that splits it into 2 patches.

Agreed, two patches is better.

> > guest. It probably doesn't make sense to install the guest's value if
> > CPUID faulting is enabled in the guest and not host since intercepting
> > CPUID likely provides better overall performance than switching the MSR
> > on entry and exit. And last but not least, since there are other bits
>
> Actually, this MSR switching is not happening on every entry and exit, it clears
> host cpuid faulting only when vmx->loaded_cpu_state == NULL, and load host cpuid
> faulting only when vmx->loaded_cpu_state != NULL. Usually, in the vcpu_run loop,
> it switchs for once.

Whoops, yeah, every save/restore cycle.

> However, there indeed is a tradeoff between switching the MSR and intercepting
> CPUID. Using hardware cpuid faulting for guest, it needs to switch related MSR,
> which incurs wrmsr overhead. But using emulation, it has to suffer the overhead
> of vmexit with intercepting CPUID instruction.
> And I don't know which is better.
>
> At this point, I think it makes sense to use 2 patches. One for fixing potential
> leaking issue that just clears cpuid faulting bit for guest, and the other
> writes guest cpuid faulting value to hardware bit to optimize performance (maybe
> it doesn't).
>
> > in the MSR that we don't want to expose to the guest, e.g. RING3MWAIT,
> > checking for a non-zero host value is probably better than checking for
> > individual feature bits. Same reasoning for writing the guest's value
> > instead of clearing just the CPUID faulting bit.
>
> About RING3MWAIT, I just let it go as without this patch.
> Since you pointed out *we* don't want to expose other bits in the MSR to the
> guest, I will clear the both bits in next version.

Clear all bits, i.e. write 0. That way KVM doesn't need to be updated
if/when hardware introduces another bit.

> >
> > So something like:
> >
> > u64 msrval = this_cpu_read(msr_misc_features_shadow);
> >
> > if (!msrval || msrval == vcpu->arch.msr_misc_features_enables)
> > return;
> >
> > wrmsrl(MSR_MISC_FEATURES_ENABLES, vcpu->arch.msr_misc_features_enables);
> >
> >
> > or if you drop the helpers:
> >
> > msrval = this_cpu_read(msr_misc_features_shadow);
> > if (msrval && msrval != vcpu->arch.msr_misc_features_enables)
> > wrmsrl(MSR_MISC_FEATURES_ENABLES,
> > vcpu->arch.msr_misc_features_enables);
> >
> > > + }
> > > +}
> > > +
> > > void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> > > {
> > > struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > @@ -1068,6 +1085,8 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu
> > > *vcpu)
> > > vmx->loaded_cpu_state = vmx->loaded_vmcs;
> > > host_state = &vmx->loaded_cpu_state->host_state;
> > >
> > > + vmx_save_host_cpuid_fault(vmx);
> > > +
> > > /*
> > > * Set host fs and gs selectors. Unfortunately, 22.2.3 does not
> > > * allow segment selectors with cpl > 0 or ti == 1.
> > > @@ -1124,6 +1143,19 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu
> > > *vcpu)
> > > }
> > > }
> > >
> > > +static void vmx_restore_host_cpuid_fault(struct vcpu_vmx *vmx)
> >
> > If you keep the helpers, maybe vmx_load_host_misc_features_enables()
> > to pair with the new function name for loading guest state?
> >
> > > +{
> > > + u64 msrval;
> > > +
> > > + if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
> > > + return;
> > > +
> > > + rdmsrl(MSR_MISC_FEATURES_ENABLES, msrval);
> > > + msrval |= vmx->host_msr_misc_features_enables &
> > > + MSR_MISC_FEATURES_ENABLES_CPUID_FAULT;
> >
> > Again, there's no need for RDMSR, the host's value can be pulled from
> > msr_misc_features_shadow, and the WRMSR can be skipped if the host and
> > guest have the same value, i.e. we didn't install the guest's value.
> >
> > u64 msrval = this_cpu_read(msr_misc_features_shadow);
> >
> > if (!msrval || msrval == vcpu->arch.msr_misc_features_enables)
> > return;
> >
> > wrmsrl(MSR_MISC_FEATURES_ENABLES, msrval);
> >
> > > + wrmsrl(MSR_MISC_FEATURES_ENABLES, msrval);
> > > +}
> > > +
> > > static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
> > > {
> > > struct vmcs_host_state *host_state;
> > > @@ -1137,6 +1169,8 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx
> > > *vmx)
> > > ++vmx->vcpu.stat.host_state_reload;
> > > vmx->loaded_cpu_state = NULL;
> > >
> > > + vmx_restore_host_cpuid_fault(vmx);
> > > +
> > > #ifdef CONFIG_X86_64
> > > rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
> > > #endif
> > > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> > > index 5df73b36fa49..ba867bbc5676 100644
> > > --- a/arch/x86/kvm/vmx/vmx.h
> > > +++ b/arch/x86/kvm/vmx/vmx.h
> > > @@ -268,6 +268,8 @@ struct vcpu_vmx {
> > > u64 msr_ia32_feature_control_valid_bits;
> > > u64 ept_pointer;
> > >
> > > + u64 host_msr_misc_features_enables;
> >
> > As mentioned above, this isn't needed.
>
> Sure, I will remove this and use msr_misc_features_shadow.
>
> > > +
> > > struct pt_desc pt_desc;
> > > };
> > >
> > > --
> > > 2.19.1
> > >
>