Re: [PATCH] KVM: vmx: speed up MSR bitmap merge

From: Jim Mattson
Date: Tue Dec 19 2017 - 14:58:24 EST


On Wed, Dec 13, 2017 at 5:30 AM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
> The bulk of the MSR bitmap is either immutable, or can be copied from
> the L1 bitmap. By initializing it at VMXON time, and copying the mutable
> parts one long at a time on vmentry (rather than one bit), about 4000
> clock cycles (30%) can be saved on a nested VMLAUNCH/VMRESUME.
>
> The resulting for loop only has four iterations, so it is cheap enough
> to reinitialize the MSR write bitmaps on every iteration, and it makes
> the code simpler.

Thanks so much for doing this!

> Suggested-by: Jim Mattson <jmattson@xxxxxxxxxx>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
> arch/x86/kvm/vmx.c | 57 ++++++++++++++++++++++++++++--------------------------
> 1 file changed, 30 insertions(+), 27 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1458cb52de68..ee214b4112af 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5217,11 +5217,6 @@ static void nested_vmx_disable_intercept_for_msr(unsigned long *msr_bitmap_l1,
> {
> int f = sizeof(unsigned long);
>
> - if (!cpu_has_vmx_msr_bitmap()) {
> - WARN_ON(1);
> - return;
> - }
> -
> /*
> * See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals
> * have the write-low and read-high bitmap offsets the wrong way round.
> @@ -7493,6 +7488,7 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu)
> (unsigned long *)__get_free_page(GFP_KERNEL);
> if (!vmx->nested.msr_bitmap)
> goto out_msr_bitmap;
> + memset(vmx->nested.msr_bitmap, 0xff, PAGE_SIZE);
> }
>
> vmx->nested.cached_vmcs12 = kmalloc(VMCS12_SIZE, GFP_KERNEL);
> @@ -10325,36 +10321,43 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
> /* This shortcut is ok because we support only x2APIC MSRs so far. */
> if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
> return false;
> + if (WARN_ON_ONCE(!cpu_has_vmx_msr_bitmap()))
> + return false;
>
> page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
> if (is_error_page(page))
> return false;
> - msr_bitmap_l1 = (unsigned long *)kmap(page);
>
> - memset(msr_bitmap_l0, 0xff, PAGE_SIZE);
> + msr_bitmap_l1 = (unsigned long *)kmap(page);
> + if (nested_cpu_has_apic_reg_virt(vmcs12)) {
> + /* Disable read intercept for all MSRs between 0x800 and 0x8ff. */

Aren't we actually adopting the read intercepts from VMCS12 and
*enabling* the *write* intercepts?

> + for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) {
> + unsigned word = msr / BITS_PER_LONG;
> + msr_bitmap_l0[word] = msr_bitmap_l1[word];
> + msr_bitmap_l0[word + (0x800 / sizeof(long))] = ~0;

The indexing above seems a bit obtuse, but maybe it will be clear
enough after the above comment is fixed up.

> + }
> + } else {
> + for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) {
> + unsigned word = msr / BITS_PER_LONG;
> + msr_bitmap_l0[word] = ~0;
> + msr_bitmap_l0[word + (0x800 / sizeof(long))] = ~0;
> + }
> + }
>
> - if (nested_cpu_has_virt_x2apic_mode(vmcs12)) {
> - if (nested_cpu_has_apic_reg_virt(vmcs12))
> - for (msr = 0x800; msr <= 0x8ff; msr++)
> - nested_vmx_disable_intercept_for_msr(
> - msr_bitmap_l1, msr_bitmap_l0,
> - msr, MSR_TYPE_R);
> + nested_vmx_disable_intercept_for_msr(
> + msr_bitmap_l1, msr_bitmap_l0,
> + APIC_BASE_MSR + (APIC_TASKPRI >> 4),

Perhaps you could #define X2APIC_MSR(reg) (APIC_BASE_MSR + ((reg) >>
4)) somewhere appropriate (e.g. arch/x86/include/asm/apicdef.h) and
use that here (and below) for brevity?

> + MSR_TYPE_W);
>
> + if (nested_cpu_has_vid(vmcs12)) {
> nested_vmx_disable_intercept_for_msr(
> - msr_bitmap_l1, msr_bitmap_l0,
> - APIC_BASE_MSR + (APIC_TASKPRI >> 4),
> - MSR_TYPE_R | MSR_TYPE_W);
> -
> - if (nested_cpu_has_vid(vmcs12)) {
> - nested_vmx_disable_intercept_for_msr(
> - msr_bitmap_l1, msr_bitmap_l0,
> - APIC_BASE_MSR + (APIC_EOI >> 4),
> - MSR_TYPE_W);
> - nested_vmx_disable_intercept_for_msr(
> - msr_bitmap_l1, msr_bitmap_l0,
> - APIC_BASE_MSR + (APIC_SELF_IPI >> 4),
> - MSR_TYPE_W);
> - }
> + msr_bitmap_l1, msr_bitmap_l0,
> + APIC_BASE_MSR + (APIC_EOI >> 4),
> + MSR_TYPE_W);
> + nested_vmx_disable_intercept_for_msr(
> + msr_bitmap_l1, msr_bitmap_l0,
> + APIC_BASE_MSR + (APIC_SELF_IPI >> 4),
> + MSR_TYPE_W);
> }
> kunmap(page);
> kvm_release_page_clean(page);
> --
> 1.8.3.1
>

Should we also think about letting L1 control pass-through of some of
the more mundane MSRs, like FS_BASE, GS_BASE, and KERNEL_GS_BASE?

Reviewed-by: Jim Mattson <jmattson@xxxxxxxxxx>