If it would be useful, we could absolutely have a flag to have allYes, I see your motivation. Does this approach support async pagefaults [1]? Ie would all the guest processes on the vCPU need to stall until a fault is resolved or is there a way to let the vCPU run and only block the faulted process?
faults go through the asynchronous mechanism. :) It's meant to just be
an optimization. For me, it is a necessary optimization.
Userfaultfd doesn't scale particularly well: we have to grab two locks
to work with the wait_queues. You could create several userfaultfds,
but the underlying issue is still there. KVM Userfault, if it uses a
wait_queue for the async fault mechanism, will have the same
bottleneck. Anish and I worked on making userfaults more scalable for
KVM[1], and we ended up with a scheme very similar to what we have in
this KVM Userfault series.
My use case already requires using a reasonably complex API forDo I understand it right that in your setup, when an EPT violation occurs,
interacting with a separate userland process for fetching memory, and
it's really fast. I've never tried to hook userfaultfd into this other
process, but I'm quite certain that [1] + this process's interface
scale better than userfaultfd does. Perhaps userfaultfd, for
not-so-scaled-up cases, could be *slightly* faster, but I mostly care
about what happens when we scale to hundreds of vCPUs.
[1]: https://lore.kernel.org/kvm/20240215235405.368539-1-amoorthy@xxxxxxxxxx/
True, it isn't the case right now. I think I fast-forwarded to a state where notifications about VMM-triggered faults to the guest_memfd are also sent asynchronously.How do you envision resolving faults in userspace? Copying the page in
(provided that userspace mapping of guest_memfd is supported [3]) and
clearing the KVM_MEMORY_ATTRIBUTE_USERFAULT alone do not look
sufficient to resolve the fault because an attempt to copy the page
directly in userspace will trigger a fault on its own
This is not true for KVM Userfault, at least for right now. Userspace
accesses to guest memory will not trigger KVM Userfaults. (I know this
name is terrible -- regular old userfaultfd() userfaults will indeed
get triggered, provided you've set things up properly.)
KVM Userfault is merely meant to catch KVM's own accesses to guest
memory (including vCPU accesses). For non-guest_memfd memslots,
userspace can totally just write through the VMA it has made (KVM
Userfault *cannot*, by virtue of being completely divorced from mm,
intercept this access). For guest_memfd, userspace could write to
guest memory through a VMA if that's where guest_memfd is headed, but
perhaps it will rely on exact details of how userspace is meant to
populate guest_memfd memory.
In case it's interesting or useful at all, we actually useThat is interesting. You're replacing UFFDIO_COPY (vma1) with a memcpy (vma2) + UFFDIO_CONTINUE (vma1), IIUC. Are both mappings created by the same process? What benefits does it bring?
UFFDIO_CONTINUE for our live migration use case. We mmap() memory
twice -- one of them we register with userfaultfd and also give to
KVM. The other one we use to install memory -- our non-faulting view
of guest memory!