[PATCH 00/25] KVM: x86: Speed up emulation of invalid state

From: Paolo Bonzini
Date: Mon Jun 09 2014 - 08:59:27 EST


This series, done in collaboration with Bandan Das, speeds up
emulation of invalid state by approximately a factor of 4
(as measured by realmode.flat). It brings together patches sent
as RFC in the past 3 months, and adds a few more on top.

The total speedup achieved is around 3x. Some changes shave a constant
number of cycles from all instructions; others only affect more complex
instructions that take more clock cycles to run. Together, these two
different effects make the speedup nicely homogeneous across various kinds
of instructions. Here are rough numbers (expressed in clock cycles on a
Sandy Bridge Xeon machine, with unrestricted_guest=0) at various points
of the series:

jump move arith load store RMW
2300 2600 2500 2800 2800 3200
1650 1950 1900 2150 2150 2600 KVM: vmx: speed up emulation of invalid guest state
900 1250 1050 1350 1300 1700 KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
900 1050 1050 1350 1300 1700 KVM: emulate: speed up emulated moves
900 1050 1050 1300 1250 1400 KVM: emulate: extend memory access optimization to stores
825 1000 1000 1250 1200 1350 KVM: emulate: do not initialize memopp
750 950 950 1150 1050 1200 KVM: emulate: avoid per-byte copying in instruction fetches
720 850 850 1075 1000 1100 KVM: x86: use kvm_read_guest_page for emulator accesses

The above only lists the patches where the improvement on kvm-unit-tests
became consistently identifiable and reproducible. Take these with a
grain of salt, since all the rounding here was done by hand, no stddev
is provided, etc.

I tried to be quite strict and limited this series to patches that obey
the following criteria:

* either the patch is by itself a measurable improvement
(example: patch 6)

* or the patch is a really really obvious improvement (example:
patch 17), the compiler must really screw up for this not to be the
case

* or the patch is just preparatory for a subsequent measurable
improvement.

Quite a few functions disappear from the profile, and others have their
cost cut by a pretty large factor:

61643 [kvm_intel] vmx_segment_access_rights
47504 [kvm] vcpu_enter_guest
34610 [kvm_intel] rmode_segment_valid
30312 7119 [kvm_intel] vmx_get_segment
27371 23363 [kvm] x86_decode_insn
20924 21185 [kernel.kallsyms] copy_user_generic_string
18775 3614 [kvm_intel] vmx_read_guest_seg_selector
18040 9580 [kvm] emulator_get_segment
16061 5791 [kvm] do_insn_fetch (__do_insn_fetch_bytes after patches)
15834 5530 [kvm] kvm_read_guest (kvm_fetch_guest_virt after patches)
15721 [kernel.kallsyms] __srcu_read_lock
15439 4115 [kvm] init_emulate_ctxt
14421 11692 [kvm] x86_emulate_instruction
12498 [kernel.kallsyms] __srcu_read_unlock
12385 11779 [kvm] __linearize
12385 13194 [kvm] decode_operand
7408 5574 [kvm] x86_emulate_insn
6447 [kvm] kvm_lapic_find_highest_irr
6390 [kvm_intel] vmx_handle_exit
5598 3418 [kvm_intel] vmx_interrupt_allowed

Honorable mentions among things that I tried and didn't have the effect
I hoped for: using __get_user/__put_user to read memory operands, and
simplifying linearize.


Patches 1-6 are various low-hanging fruit, which alone provide a
2-2.5x speedup (higher on simpler instructions).

Patches 7-12 make the emulator cache the host virtual address of memory
operands, thus avoid walking the page table twice.

Patch 13-18 avoid wasting time unnecessarily in the memset call of
x86_emulate_ctxt.

Patches 19-22 speed up operand fetching.

Patches 23-25 are the loose ends.

Bandan Das (6):
KVM: emulate: move init_decode_cache to emulate.c
KVM: emulate: Remove ctxt->intercept and ctxt->check_perm checks
KVM: emulate: cleanup decode_modrm
KVM: emulate: clean up initializations in init_decode_cache
KVM: emulate: rework seg_override
KVM: emulate: do not initialize memopp

Paolo Bonzini (19):
KVM: vmx: speed up emulation of invalid guest state
KVM: x86: return all bits from get_interrupt_shadow
KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
KVM: emulate: move around some checks
KVM: emulate: protect checks on ctxt->d by a common "if (unlikely())"
KVM: emulate: speed up emulated moves
KVM: emulate: simplify writeback
KVM: emulate: abstract handling of memory operands
KVM: export mark_page_dirty_in_slot
KVM: emulate: introduce memory_prepare callback to speed up memory access
KVM: emulate: activate memory access optimization
KVM: emulate: extend memory access optimization to stores
KVM: emulate: speed up do_insn_fetch
KVM: emulate: avoid repeated calls to do_insn_fetch_bytes
KVM: emulate: avoid per-byte copying in instruction fetches
KVM: emulate: put pointers in the fetch_cache
KVM: x86: use kvm_read_guest_page for emulator accesses
KVM: emulate: simplify BitOp handling
KVM: emulate: fix harmless typo in MMX decoding

arch/x86/include/asm/kvm_emulate.h | 59 ++++-
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/emulate.c | 481 ++++++++++++++++++++++---------------
arch/x86/kvm/svm.c | 6 +-
arch/x86/kvm/trace.h | 6 +-
arch/x86/kvm/vmx.c | 9 +-
arch/x86/kvm/x86.c | 147 +++++++++---
include/linux/kvm_host.h | 6 +
virt/kvm/kvm_main.c | 17 +-
9 files changed, 473 insertions(+), 260 deletions(-)

--
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/