Re: [RFC PATCH kernel] iommufd: Allow mapping from KVM's guest_memfd

From: Jason Gunthorpe

Date: Thu Feb 26 2026 - 14:15:59 EST


On Thu, Feb 26, 2026 at 12:19:52AM -0800, Ackerley Tng wrote:
> Sean Christopherson <seanjc@xxxxxxxxxx> writes:
>
> > On Wed, Feb 25, 2026, Alexey Kardashevskiy wrote:
> >> For the new guest_memfd type, no additional reference is taken as
> >> pinning is guaranteed by the KVM guest_memfd library.
> >>
> >> There is no KVM-GMEMFD->IOMMUFD direct notification mechanism as
> >> the assumption is that:
> >> 1) page stage change events will be handled by VMM which is going
> >> to call IOMMUFD to remap pages;
> >> 2) shrinking GMEMFD equals to VM memory unplug and VMM is going to
> >> handle it.
> >
> > The VMM is outside of the kernel's effective TCB. Assuming the VMM will always
> > do the right thing is a non-starter.
>
> I think looking up the guest_memfd file from the userspace address
> (uptr) is a good start

Please no, if we need complicated things like notifiers then it is
better to start directly with the struct file interface and get
immediately into some guestmemfd API instead of trying to get their
from a VMA. A VMA doesn't help in any way and just complicates things.

> I didn't think of this before LPC but forcing unmapping during
> truncation (aka shrinking guest_memfd) is probably necessary for overall
> system stability and correctness, so notifying and having guest_memfd
> track where its pages were mapped in the IOMMU is necessary. Whether or
> not to unmap during conversions could be a arch-specific thing, but all
> architectures would want the memory unmapped if the memory is removed
> from guest_memfd ownership.

Things like truncate are a bit easier to handle, you do need a
protective notifier, but if it detects truncate while an iommufd area
still covers the truncated region it can just revoke the whole
area. Userspace made a mistake and gets burned but the kernel is
safe. We don't need something complicated kernel side to automatically
handle removing just the slice of truncated guestmemfd, for example.

If guestmemfd is fully pinned and cannot free memory outside of
truncate that may be good enough (though somehow I think that is not
the case) - and I don't understand what issues Intel has with iommu
access.

Jason