Re: [PATCH] kvm: VMX: do not use vm-exit instruction length for fast MMIO

From: Jason Wang
Date: Fri Aug 18 2017 - 04:48:44 EST




On 2017å08æ16æ 22:10, Michael S. Tsirkin wrote:
On Wed, Aug 16, 2017 at 03:34:54PM +0200, Paolo Bonzini wrote:
Microsoft pointed out privately to me that KVM's handling of
KVM_FAST_MMIO_BUS is invalid. Using skip_emulation_instruction is invalid
in EPT misconfiguration vmexit handlers, because neither EPT violations
nor misconfigurations are listed in the manual among the VM exits that
set the VM-exit instruction length field.

While physical processors seem to set the field, this is not architectural
and is just a side effect of the implementation. I couldn't convince
myself of any condition on the exit qualification where VM-exit
instruction length "has" to be defined; there are no trap-like VM-exits
that can be repurposed; and fault-like VM-exits such as descriptor-table
exits provide no decoding information. So I don't really see any way
to keep the full speedup.

What we can do is use EMULTYPE_SKIP; it only saves 200 clock cycles
because computing the physical RIP and reading the instruction is
expensive, but at least the eventfd is signaled before entering the
emulator. This saves on latency. While at it, don't check breakpoints
when skipping the instruction, as presumably any side effect has been
exposed already.

Adding a hypercall or MSR write that does a fast MMIO write to a physical
address would do it, but it adds hypervisor knowledge in virtio, including
CPUID handling. So it would be pretty ugly in the guest-side implementation,
but if somebody wants to do it and the virtio side is acceptable to the
virtio maintainers, I am okay with it.

Cc: Michael S. Tsirkin<mst@xxxxxxxxxx>
Cc:stable@xxxxxxxxxxxxxxx
Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
Suggested-by: Radim KrÄmÃÅ<rkrcmar@xxxxxxxxxx>
Signed-off-by: Paolo Bonzini<pbonzini@xxxxxxxxxx>
Jason (cc) who worked on the original optimization said he can
work to test the performance impact.

I see regressions on both latency and cpu utilization through netperf TCP_RR test:

pkt_size/sessions/+transaction_rate%/+per_cpu_transaction_rate%
1/ 1/ +0%/ -5%
1/ 25/ -1%/ -2%
1/ 50/ -9%/ -10%
64/ 1/ -3%/ -9%
64/ 25/ 0%/ -2%
64/ 50/ -10%/ -11%
256/ 1/ -10%/ -17%
256/ 25/ -11%/ -12%
256/ 50/ -9%/ -11%

Thanks