Re: RFC: Getting rid of LTR in VMX

From: Paolo Bonzini
Date: Mon Feb 20 2017 - 06:05:21 EST

On 18/02/2017 04:29, Andy Lutomirski wrote:
> There's no code here because the patch is trivial, but I want to run
> the idea by you all first to see if there are any issues.
> VMX is silly and forces the TSS limit to the minimum on VM exits. KVM
> wastes lots of cycles bumping it back up to accomodate the io bitmap.

Actually looked at the code now...

reload_tss is only invoked for userspace exits, so it is a nice-to-have
but it wouldn't show on most workloads. Still it does save 150-200
clock cycles to remove it (I just commented out reload_tss() from
__vmx_load_host_state to test).

Another 100-150 could be saved if we could just use rdgsbase/wrgsbase,
instead of rdmsr/wrmsr, to read and write the kernel GS. Super hacky
patch after sig.

vmx_save_host_state is really slow... It would be nice if Intel
defined an XSAVES state to do it (just like AMD's vmload/vmsave).


diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9856b73a21ad..e76bfec463bc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2138,9 +2138,12 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu)

#ifdef CONFIG_X86_64
- rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
+ local_irq_disable();
+ asm volatile("swapgs; rdgsbase %0" : "=r" (vmx->msr_host_kernel_gs_base));
if (is_long_mode(&vmx->vcpu))
- wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
+ asm volatile("wrgsbase %0" : : "r" (vmx->msr_guest_kernel_gs_base));
+ asm volatile("swapgs");
+ local_irq_enable();
if (boot_cpu_has(X86_FEATURE_MPX))
rdmsrl(MSR_IA32_BNDCFGS, vmx->host_state.msr_host_bndcfgs);
@@ -2152,14 +2155,20 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu)

static void __vmx_load_host_state(struct vcpu_vmx *vmx)
+ unsigned long flags;
if (!vmx->host_state.loaded)

vmx->host_state.loaded = 0;
#ifdef CONFIG_X86_64
+ local_irq_save(flags);
+ asm("swapgs");
if (is_long_mode(&vmx->vcpu))
- rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
+ asm volatile("rdgsbase %0" : "=r" (vmx->msr_guest_kernel_gs_base));
+ asm volatile("wrgsbase %0; swapgs" : : "r" (vmx->msr_host_kernel_gs_base));
+ local_irq_restore(flags);
if (vmx->host_state.gs_ldt_reload_needed) {
@@ -2177,10 +2186,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
loadsegment(es, vmx->host_state.es_sel);
- reload_tss();
-#ifdef CONFIG_X86_64
- wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
+ //reload_tss();
if (vmx->host_state.msr_host_bndcfgs)
wrmsrl(MSR_IA32_BNDCFGS, vmx->host_state.msr_host_bndcfgs);
@@ -3469,6 +3475,7 @@ static int hardware_enable(void)
wrmsrl(MSR_IA32_FEATURE_CONTROL, old | test_bits);
+ cr4_set_bits(X86_CR4_FSGSBASE);

if (vmm_exclusive) {

> I propose that we rework this. Add a percpu variable that indicates
> whether the TSS limit needs to be refreshed. On task switch, if the
> new task has TIF_IO_BITMAP set, then check that flag and, if set,
> refresh TR and clear the flag. On VMX exit, set the flag.
> The TSS limit is (phew!) invisible to userspace, so we don't have ABI
> issues to worry about here. We also shouldn't have security issues
> because a too-low TSS limit just results in unprivileged IO operations
> generating #GP, which is exactly what we want.
> What do you all think? I expect a speedup of a couple hundred cycles
> on each VM exit.