[PATCH v2 00/17] KVM: Dirty ring interface

From: Peter Xu
Date: Fri Dec 20 2019 - 16:17:25 EST


Branch is here: https://github.com/xzpeter/linux/tree/kvm-dirty-ring
(based on 5.4.0)

This is v2 of the dirty ring series, and also the first non-RFC
version of it. I didn't put a changelog from v1-rfc because I feel
like it would be easier to go into the patchset comparing to read that
lengthy and probably helpless changelog. However I do like to do a
summary here on what has majorly changed, and also some conclusions on
the previous v1 discussions.

======================

* Per-vm ring is dropped

For x86 (which is still the major focus for now), we found that kvmgt
is probably the only one that still writes to the guest without a vcpu
context. It would be a complete pity if we keep the per-vm ring only
for kvmgt (who shouldn't write directly to guest via kvm api after
all...), so remove it. Work should be ongoing in parallel to refactor
kvmgt to not use kvm apis like kvm_write_guest().

However I don't want to break kvmgt before it's fixed. So this series
uses an interim way to solve this by fallback no-vcpu-context writes
to vcpu0 if there is. So we will keep the interface clean (per-vcpu
only), while we don't break the code base. After kvmgt is fixed, we
can probably even drop this special fallback and kvm->dirty_ring_lock.

* Waitqueue is still kept (for now)

We did plan to drop the waitqueue, however again if with kvmgt we
still have chance to ful-fill a ring (and I feel like it'll definitely
happen if we migrate a kvmgt guest). This series will only trigger
the waitqueue mechanism if it's the special case (no-vcpu-context) and
actually it naturally avoids another mmu lock deadlock issue I've
encountered, which is good.

For vcpu context writes, now the series is even more strict that we'll
directly fail the KVM_RUN if the dirty ring is soft full, until the
userspace collects the dirty rings first. That'll guarantee the ring
will never be full. With that, I dropped KVM_REQ_DIRTY_RING_FULL
together because then it's not needed.

Potentially this could still also be used by ARM when there're code
paths that dump the ARM device information to the guests
(e.g. KVM_DEV_ARM_ITS_SAVE_TABLES). We'll see. No matter what, even
if the code is there, x86 (as long as without kvmgt) should never
trigger waitqueue.

Although the waitqueue is kept, I dropped the complete waitqueue test,
simply because now I can never trigger it without kvmgt...

* Why not virtio?

There's already some discussion during v1 patchset on whether it's
good to use virtio for the data path of delivering dirty pages [1].
I'd confess the only thing that we might consider to use is the vring
layout (because virtqueue is tightly bound to devices, while we don't
have a device contet here), however it's a pity that even we only use
the most low-level vring api it'll be at least iov based which is
already an overkill for dirty ring (which is literally an array of
addresses). So I just kept things easy.

======================

About the patchset:

Patch 1-5: Mostly cleanups
Patch 6,7: Prepare for the dirty ring interface
Patch 8-10: Dirty ring implementation (majorly patch 8)
Patch 11-17: Test cases update

Please have a look, thanks.

[1] V1 is here: https://lore.kernel.org/kvm/20191129213505.18472-1-peterx@xxxxxxxxxx

Paolo Bonzini (1):
KVM: Move running VCPU from ARM to common code

Peter Xu (16):
KVM: Remove kvm_read_guest_atomic()
KVM: X86: Change parameter for fast_page_fault tracepoint
KVM: X86: Don't track dirty for KVM_SET_[TSS_ADDR|IDENTITY_MAP_ADDR]
KVM: Cache as_id in kvm_memory_slot
KVM: Add build-time error check on kvm_run size
KVM: Pass in kvm pointer into mark_page_dirty_in_slot()
KVM: X86: Implement ring-based dirty memory tracking
KVM: Make dirty ring exclusive to dirty bitmap log
KVM: Don't allocate dirty bitmap if dirty ring is enabled
KVM: selftests: Always clear dirty bitmap after iteration
KVM: selftests: Sync uapi/linux/kvm.h to tools/
KVM: selftests: Use a single binary for dirty/clear log test
KVM: selftests: Introduce after_vcpu_run hook for dirty log test
KVM: selftests: Add dirty ring buffer test
KVM: selftests: Let dirty_log_test async for dirty ring test
KVM: selftests: Add "-c" parameter to dirty log test

Documentation/virt/kvm/api.txt | 96 ++++
arch/arm/include/asm/kvm_host.h | 2 -
arch/arm64/include/asm/kvm_host.h | 2 -
arch/x86/include/asm/kvm_host.h | 3 +
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/Makefile | 3 +-
arch/x86/kvm/mmu.c | 6 +
arch/x86/kvm/mmutrace.h | 9 +-
arch/x86/kvm/vmx/vmx.c | 25 +-
arch/x86/kvm/x86.c | 9 +
include/linux/kvm_dirty_ring.h | 57 +++
include/linux/kvm_host.h | 44 +-
include/trace/events/kvm.h | 78 ++++
include/uapi/linux/kvm.h | 31 ++
tools/include/uapi/linux/kvm.h | 36 ++
tools/testing/selftests/kvm/Makefile | 2 -
.../selftests/kvm/clear_dirty_log_test.c | 2 -
tools/testing/selftests/kvm/dirty_log_test.c | 420 ++++++++++++++++--
.../testing/selftests/kvm/include/kvm_util.h | 4 +
tools/testing/selftests/kvm/lib/kvm_util.c | 64 +++
.../selftests/kvm/lib/kvm_util_internal.h | 3 +
virt/kvm/arm/arch_timer.c | 2 +-
virt/kvm/arm/arm.c | 29 --
virt/kvm/arm/perf.c | 6 +-
virt/kvm/arm/vgic/vgic-mmio.c | 15 +-
virt/kvm/dirty_ring.c | 201 +++++++++
virt/kvm/kvm_main.c | 269 +++++++++--
27 files changed, 1274 insertions(+), 145 deletions(-)
create mode 100644 include/linux/kvm_dirty_ring.h
delete mode 100644 tools/testing/selftests/kvm/clear_dirty_log_test.c
create mode 100644 virt/kvm/dirty_ring.c

--
2.24.1