Re: [RFC PATCH 14/18] KVM: Add asynchronous userfaults, KVM_READ_USERFAULT
From: Nikita Kalyazin
Date: Fri Jul 26 2024 - 12:50:37 EST
Hi James,
On 11/07/2024 00:42, James Houghton wrote:
It is possible that KVM wants to access a userfault-enabled GFN in a
path where it is difficult to return out to userspace with the fault
information. For these cases, add a mechanism for KVM to wait for a GFN
to not be userfault-enabled.
In this patch series, an asynchronous notification mechanism is used
only in cases "where it is difficult to return out to userspace with the
fault information". However, we (AWS) have a use case where we would
like to be notified asynchronously about _all_ faults. Firecracker can
restore a VM from a memory snapshot where the guest memory is supplied
via a Userfaultfd by a process separate from the VMM itself [1]. While
it looks technically possible for the VMM process to handle exits via
forwarding the faults to the other process, that would require building
a complex userspace protocol on top and likely introduce extra latency
on the critical path. This also implies that a KVM API
(KVM_READ_USERFAULT) is not suitable, because KVM checks that the ioctls
are performed specifically by the VMM process [2]:
if (kvm->mm != current->mm || kvm->vm_dead)
return -EIO;
> The implementation of this mechanism is certain to change before KVM
> Userfault could possibly be merged.
How do you envision resolving faults in userspace? Copying the page in
(provided that userspace mapping of guest_memfd is supported [3]) and
clearing the KVM_MEMORY_ATTRIBUTE_USERFAULT alone do not look
sufficient to resolve the fault because an attempt to copy the page
directly in userspace will trigger a fault on its own and may lead to a
deadlock in the case where the original fault was caused by the VMM. An
interface similar to UFFDIO_COPY is needed that would allocate a page,
copy the content in and update page tables.
[1] Firecracker snapshot restore via UserfaultFD:
https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/handling-page-faults-on-snapshot-resume.md
[2] KVM ioctl check for the address space:
https://elixir.bootlin.com/linux/v6.10.1/source/virt/kvm/kvm_main.c#L5083
[3] mmap() of guest_memfd:
https://lore.kernel.org/kvm/489d1494-626c-40d9-89ec-4afc4cd0624b@xxxxxxxxxx/T/#mc944a6fdcd20a35f654c2be99f9c91a117c1bed4
Thanks,
Nikita