Re: [RFC PATCH kernel] iommufd: Allow mapping from KVM's guest_memfd

From: Xu Yilun

Date: Fri Feb 27 2026 - 06:23:54 EST

On Thu, Feb 26, 2026 at 03:27:00PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 26, 2026 at 05:47:50PM +1100, Alexey Kardashevskiy wrote:
> >
> >
> > On 26/2/26 00:55, Sean Christopherson wrote:
> > > On Wed, Feb 25, 2026, Alexey Kardashevskiy wrote:
> > > > For the new guest_memfd type, no additional reference is taken as
> > > > pinning is guaranteed by the KVM guest_memfd library.
> > > >
> > > > There is no KVM-GMEMFD->IOMMUFD direct notification mechanism as
> > > > the assumption is that:
> > > > 1) page stage change events will be handled by VMM which is going
> > > > to call IOMMUFD to remap pages;
> > > > 2) shrinking GMEMFD equals to VM memory unplug and VMM is going to
> > > > handle it.
> > >
> > > The VMM is outside of the kernel's effective TCB. Assuming the VMM will always
> > > do the right thing is a non-starter.
> >
> > Right.
> >
> > But, say, for 1), VMM does not the right thing and skips on PSC -
> > the AMD host will observe IOMMU fault events - noisy but harmless. I
> > wonder if it is different for others though.
>
> ARM is also supposed to be safe as GPT faults are contained, IIRC.

Intel TDX will cause host machine check and restart, which are not
contained.

>
> However, it is not like AMD in many important ways here. Critically ARM
> has a split guest physical space where the low addresses are all
> private and the upper addresses are all shared.

This is same as Intel TDX, the GPA shared bit are used by IOMMU to
target shared/private. You can imagine for T=1, there are 2 IOPTs, or
1 IOPT with all private at lower address & all shared at higher address.

>
> Thus on Linux the iommu should be programed with the shared pages
> mapped into the shared address range. It would be wasteful to program
> it with large amounts of IOPTEs that are already know to be private.

For Intel TDX, it is not just a waste, the redundant IOMMU mappings are
dangerous.

>
> I think if you are fully doing in-place conversion then you could
> program the entire shared address range to point to the memory pool
> (eg with 1G huge pages) and rely entirely on the GPT to arbitrate
> access. I don't think that is implemented in Linux though?
>
> While on AMD, IIRC, the iommu should be programed with both the shared
> and private pages in the respective GPA locations, but due to the RMP
> matching insanity you have to keep restructuring the IOPTEs to exactly
> match the RMP layout.
>
> I have no idea what Intel needs.

Secure part of IOPT (lower address) reuses KVM MMU (S-EPT) so needs no
extra update but needs a global IOTLB flush. The Shared part of IOPT
for T=1 needs update based on GPA.

>
> Jason