[PATCH v2 00/13] KVM: Introduce KVM Userfault

From: James Houghton
Date: Thu Jan 09 2025 - 15:50:43 EST


This is a v2 of KVM Userfault, mostly unchanged from v1[5]. Changelog here:

v1->v2:
- For arm64, no longer zap stage 2 when disabling KVM_MEM_USERFAULT
(thanks Oliver).
- Fix the userfault_bitmap validation and casts (thanks kernel test
robot).
- Fix _Atomic cast for the userfault bitmap in the selftest (thanks
kernel test robot).
- Pick up Reviewed-by on doc changes (thanks Bagas).

And here is a trimmed down cover letter from v1, slightly modified
given the small arm64 change:

Please see the RFC[1] for the problem description. In summary,
guest_memfd VMs have no mechanism for doing post-copy live migration.
KVM Userfault provides such a mechanism.

There is a second problem that KVM Userfault solves: userfaultfd-based
post-copy doesn't scale very well. KVM Userfault when used with
userfaultfd can scale much better in the common case that most post-copy
demand fetches are a result of vCPU access violations. This is a
continuation of the solution Anish was working on[3]. This aspect of
KVM Userfault is important for userfaultfd-based live migration when
scaling up to hundreds of vCPUs with ~30us network latency for a
PAGE_SIZE demand-fetch.

The implementation in this series is version than the RFC[1]. It adds...
1. a new memslot flag is added: KVM_MEM_USERFAULT,
2. a new parameter, userfault_bitmap, into struct kvm_memory_slot,
3. a new KVM_RUN exit reason: KVM_MEMORY_EXIT_FLAG_USERFAULT,
4. a new KVM capability KVM_CAP_USERFAULT.

KVM Userfault does not attempt to catch KVM's own accesses to guest
memory. That is left up to userfaultfd.

When enabling KVM_MEM_USERFAULT for a memslot, the second-stage mappings
are zapped, and new faults will check `userfault_bitmap` to see if the
fault should exit to userspace.

When KVM_MEM_USERFAULT is enabled, only PAGE_SIZE mappings are
permitted.

When disabling KVM_MEM_USERFAULT, huge mappings will be reconstructed
consistent with dirty log disabling. So on x86, huge mappings will be
reconstructed, but on arm64, they won't be.

KVM Userfault is not compatible with async page faults. Nikita has
proposed a new implementation of async page faults that is more
userspace-driven that *is* compatible with KVM Userfault[4].

See v1 for more performance details[5]. They are unchanged in this v2.

This series is based on the latest kvm/next.

[1]: https://lore.kernel.org/kvm/20240710234222.2333120-1-jthoughton@xxxxxxxxxx/
[2]: https://lpc.events/event/18/contributions/1757/
[3]: https://lore.kernel.org/all/20240215235405.368539-1-amoorthy@xxxxxxxxxx/
[4]: https://lore.kernel.org/kvm/20241118123948.4796-1-kalyazin@xxxxxxxxxx/#t
[5]: https://lore.kernel.org/kvm/20241204191349.1730936-1-jthoughton@xxxxxxxxxx/

James Houghton (13):
KVM: Add KVM_MEM_USERFAULT memslot flag and bitmap
KVM: Add KVM_MEMORY_EXIT_FLAG_USERFAULT
KVM: Allow late setting of KVM_MEM_USERFAULT on guest_memfd memslot
KVM: Advertise KVM_CAP_USERFAULT in KVM_CHECK_EXTENSION
KVM: x86/mmu: Add support for KVM_MEM_USERFAULT
KVM: arm64: Add support for KVM_MEM_USERFAULT
KVM: selftests: Fix vm_mem_region_set_flags docstring
KVM: selftests: Fix prefault_mem logic
KVM: selftests: Add va_start/end into uffd_desc
KVM: selftests: Add KVM Userfault mode to demand_paging_test
KVM: selftests: Inform set_memory_region_test of KVM_MEM_USERFAULT
KVM: selftests: Add KVM_MEM_USERFAULT + guest_memfd toggle tests
KVM: Documentation: Add KVM_CAP_USERFAULT and KVM_MEM_USERFAULT
details

Documentation/virt/kvm/api.rst | 33 +++-
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/mmu.c | 26 +++-
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/mmu/mmu.c | 27 +++-
arch/x86/kvm/mmu/mmu_internal.h | 20 ++-
arch/x86/kvm/x86.c | 36 +++--
include/linux/kvm_host.h | 19 ++-
include/uapi/linux/kvm.h | 6 +-
.../selftests/kvm/demand_paging_test.c | 145 ++++++++++++++++--
.../testing/selftests/kvm/include/kvm_util.h | 5 +
.../selftests/kvm/include/userfaultfd_util.h | 2 +
tools/testing/selftests/kvm/lib/kvm_util.c | 42 ++++-
.../selftests/kvm/lib/userfaultfd_util.c | 2 +
.../selftests/kvm/set_memory_region_test.c | 33 ++++
virt/kvm/Kconfig | 3 +
virt/kvm/kvm_main.c | 54 ++++++-
17 files changed, 419 insertions(+), 36 deletions(-)


base-commit: 10b2c8a67c4b8ec15f9d07d177f63b563418e948
--
2.47.1.613.gc27f4b7a9f-goog