Re: [PATCH v2 2/2] kvm/vmx: Using hardware cpuid faulting to avoid emulation overhead

From: Sean Christopherson
Date: Tue Mar 19 2019 - 20:09:46 EST


On Wed, Mar 20, 2019 at 01:51:28AM +0800, Xiaoyao Li wrote:
> On Tue, 2019-03-19 at 07:28 -0700, Sean Christopherson wrote:
> > On Tue, Mar 19, 2019 at 12:37:23PM +0800, Xiaoyao Li wrote:
> > > On Mon, 2019-03-18 at 09:38 -0700, Sean Christopherson wrote:
> > > > On Mon, Mar 18, 2019 at 07:43:24PM +0800, Xiaoyao Li wrote:
> > > > > Current cpuid faulting of guest is purely emulated in kvm, which
> > > > > exploits
> > > > > CPUID vm exit to inject #GP to guest. However, if host hardware cpu has
> > > > > X86_FEATURE_CPUID_FAULT, we can just use the hardware cpuid faulting for
> > > > > guest to avoid the vm exit overhead.
> > > >
> > > > Heh, I obviously didn't look at this patch before responding to patch 1/2.
> > > >
> > > > > Note: cpuid faulting takes higher priority over CPUID instruction vm
> > > > > exit (Intel SDM vol3.25.1.1).
> > > > >
> > > > > Since cpuid faulting only exists on some Intel's cpu, just apply this
> > > > > optimization to vmx.
> > > > >
> > > > > Signed-off-by: Xiaoyao Li <xiaoyao.li@xxxxxxxxxxxxxxx>
> > > > > ---
> > > > > arch/x86/include/asm/kvm_host.h | 2 ++
> > > > > arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++----
> > > > > arch/x86/kvm/x86.c | 15 ++++++++++++---
> > > > > 3 files changed, 29 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/arch/x86/include/asm/kvm_host.h
> > > > > b/arch/x86/include/asm/kvm_host.h
> > > > > index ce79d7bfe1fd..14cad587b804 100644
> > > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > > @@ -1339,6 +1339,8 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long
> > > > > msw);
> > > > > void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
> > > > > int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
> > > > >
> > > > > +int kvm_supported_msr_misc_features_enables(struct kvm_vcpu *vcpu, u64
> > > > > data);
> > > > > +
> > > > > int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> > > > > int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> > > > >
> > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > > > index 2c59e0209e36..6b413e471dca 100644
> > > > > --- a/arch/x86/kvm/vmx/vmx.c
> > > > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > > > @@ -1037,7 +1037,7 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
> > > > >
> > > > > static void vmx_save_host_cpuid_fault(struct vcpu_vmx *vmx)
> > > > > {
> > > > > - u64 host_val;
> > > > > + u64 host_val, guest_val;
> > > > >
> > > > > if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
> > > > > return;
> > > > > @@ -1045,10 +1045,12 @@ static void vmx_save_host_cpuid_fault(struct
> > > > > vcpu_vmx *vmx)
> > > > > rdmsrl(MSR_MISC_FEATURES_ENABLES, host_val);
> > > > > vmx->host_msr_misc_features_enables = host_val;
> > > > >
> > > > > - /* clear cpuid fault bit to avoid it leak to guest */
> > > > > - if (host_val & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> > > > > + guest_val = vmx->vcpu.arch.msr_misc_features_enables;
> > > > > +
> > > > > + /* we can use the hardware cpuid faulting to avoid emulation
> > > > > overhead */
> > > > > + if ((host_val ^ guest_val) &
> > > > > MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> > > > > wrmsrl(MSR_MISC_FEATURES_ENABLES,
> > > > > - host_val &
> > > > > ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
> > > > > + host_val ^
> > > > > MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
> > > > > }
> > > > > }
> > > > >
> > > > > @@ -2057,6 +2059,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu,
> > > > > struct
> > > > > msr_data *msr_info)
> > > > > else
> > > > > vmx->pt_desc.guest.addr_a[index / 2] = data;
> > > > > break;
> > > > > + case MSR_MISC_FEATURES_ENABLES:
> > > > > + if (!kvm_supported_msr_misc_features_enables(vcpu,
> > > > > data))
> > > > > + return 1;
> > > > > + if (boot_cpu_has(X86_FEATURE_CPUID_FAULT)) {
> > > > > + if (vmx->loaded_cpu_state)
> > > >
> > > > No need for two separate if statements. And assuming we're checking the
> > > > existing shadow value when loading guest/host state, the WRMSR should
> > > > only be done if the host's value is non-zero.
> > >
> > > I'll combine these two if statements into one.
> > >
> > > I cannot understand why the WRMSR should only be done if the host's value is
> > > non-zero. I think there is no depedency with host's value, if using the
> > > hardware
> > > cpuid faulting. We just need to set the value to real hardware MSR.
> >
> > What I was trying to say in patch 1/2 regarding save/restore, is that I
> > don't think it is worthwhile to voluntarily switch hardware's value. In
> > other words, do the WRMSR if and only if it's absolutely necessary. And
> > that means only installing the guest's value if the host's value is
> > non-zero and host_val != guest_val. If the host's value is zero, then
> > the guest's value is irrelevant as CPUID faulting behavior will naturally
> > be taken care of when intercepting CPUID. And for obvious reasons the
> > WRMSR can be skipped if host and guest want the same value.
>
> The purpose of this patch is always using hardware cpuid fault if hardware cpu
> has this feature. Because emuated cpuid faulting needs entire vmexit process,
> but using hardware cpuid faulting just adds two WRMSR in save/restore cycle only
> when host_val != guest_val.
>
> Also I conclude the handling ou said as below:
> When host's value is zero, we do nothing to this MSR, and let guest ues emulated
> cpuid faulting through CPUID intercepting.
> When host's value is non-zero, we load the guest'value into hardware MSR, which
> means we use hardware cpuid faulting.
>
> So the difference is when host value is zero, you choose to use emulated cpuid
> faulting. What's meaning of chooseing emualtion or hardware feature based on
> host's value?

To save cycles by avoiding WRMSR whenever possible. WRMSR is expensive, even
if it's limited to vcpu_{load,put}(), e.g. a workload that triggers a lot of
exits to userspace isn't going to be thrilled about the extra 250-300 cycles
added to each round trip.

On the other hand, emulating CPUID faulting only adds a VM-Exit to userspace
CPUID instructions that will fault anyways, and faults aren't exactly fast
paths. Most uses of CPUID in userspace are in application startup, e.g. a
hanful of CPUIDs to determine what features can be used. So even if the ~1000
cycles added by the VM-Exit is somehow meaningful to the fault path, its
impact is limited to a tiny number of instructions executed in the guest,
whereas doing WRMSR affects every vcpu_{load,put}.