Re: [RFC PATCH 14/18] KVM: Add asynchronous userfaults, KVM_READ_USERFAULT

From: Nikita Kalyazin
Date: Fri Jul 26 2024 - 12:50:37 EST


Hi James,

On 11/07/2024 00:42, James Houghton wrote:
It is possible that KVM wants to access a userfault-enabled GFN in a
path where it is difficult to return out to userspace with the fault
information. For these cases, add a mechanism for KVM to wait for a GFN
to not be userfault-enabled.
In this patch series, an asynchronous notification mechanism is used only in cases "where it is difficult to return out to userspace with the fault information". However, we (AWS) have a use case where we would like to be notified asynchronously about _all_ faults. Firecracker can restore a VM from a memory snapshot where the guest memory is supplied via a Userfaultfd by a process separate from the VMM itself [1]. While it looks technically possible for the VMM process to handle exits via forwarding the faults to the other process, that would require building a complex userspace protocol on top and likely introduce extra latency on the critical path. This also implies that a KVM API (KVM_READ_USERFAULT) is not suitable, because KVM checks that the ioctls are performed specifically by the VMM process [2]:
if (kvm->mm != current->mm || kvm->vm_dead)
return -EIO;

> The implementation of this mechanism is certain to change before KVM
> Userfault could possibly be merged.
How do you envision resolving faults in userspace? Copying the page in (provided that userspace mapping of guest_memfd is supported [3]) and clearing the KVM_MEMORY_ATTRIBUTE_USERFAULT alone do not look sufficient to resolve the fault because an attempt to copy the page directly in userspace will trigger a fault on its own and may lead to a deadlock in the case where the original fault was caused by the VMM. An interface similar to UFFDIO_COPY is needed that would allocate a page, copy the content in and update page tables.

[1] Firecracker snapshot restore via UserfaultFD: https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/handling-page-faults-on-snapshot-resume.md
[2] KVM ioctl check for the address space: https://elixir.bootlin.com/linux/v6.10.1/source/virt/kvm/kvm_main.c#L5083
[3] mmap() of guest_memfd: https://lore.kernel.org/kvm/489d1494-626c-40d9-89ec-4afc4cd0624b@xxxxxxxxxx/T/#mc944a6fdcd20a35f654c2be99f9c91a117c1bed4

Thanks,
Nikita