RE: [PATCH] iommu/vt-d: Atomic breakdown of IOPT into finer granularity

From: Tian, Kevin
Date: Tue Aug 15 2023 - 00:07:39 EST


> From: Tian, Kevin
> Sent: Tuesday, August 15, 2023 11:15 AM
>
> > From: Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx>
> > Sent: Tuesday, August 15, 2023 10:06 AM
> >
> > [Please allow me to include Kevin and Alex in this thread.]
> >
> > On 2023/8/14 20:10, Jie Ji wrote:
> > > With the addition of IOMMU support for IO page fault, it's now possible
> > > to unpin the memory which DMA remapping. However, the lack of
> support
> > > for unmapping a subrange of the I/O page table (IOPT) in IOMMU can
> lead
> > > to some issues.
> >
> > Is this the right contract about how iommu_map/unmap() should be used?
> > If I remember it correctly, IOVA ranges should be mapped in pairs. That
> > means, if a range is mapped by iommu_map(), the same range should be
> > unmapped with iommu_unmap().
> >
> > Any misunderstanding or anything changed?
> >
> > >
> > > For instance, a virtual machine can establish IOPT of 2M/1G for better
> > > performance, while the host system enable swap and attempts to swap
> out
> > > some 4K pages. Unfortunately, unmap subrange of the large-page
> mapping
> > > will make IOMMU page walk to error level, and finally cause kernel crash.
> >
> > Sorry that I can't fully understand this use case. Are you talking about
> > the nested translation where user spaces manage their own IO page
> > tables? But how can those pages been swapped out?
> >
>
> It's not related to nested. I think they are interested in I/O page fault in
> stage-2 so there is no need to pin the guest memory.
>
> But I don't think this patch along makes any sense. It should be part of
> a big series which enables iommufd to support stage-2 page fault, e.g.
> iommufd will register a fault handler on stage-2 hwpt which first calls
> handle_mm_fault() to fix cpu page table then calls iommu_map() to
> setup the iova mapping. Then upon mmu notifier on any host mapping
> changes from mm, iommufd calls iommu_unmap() or other helpers to
> adjust the iova mapping accordingly.
>
> the io_pagetable metadata which tracks user request is unchanged
> in that process.
>
> vfio driver needs report to iommufd whether a bound device can fully
> support I/O page fault for all DMA requests (beyond what PCI PRI allows).
>
> There are a lot to do before we need take time to review this iommu
> driver specific change.

Another option is to directly use KVM EPT as a special hwpt type then
most of the io pagetable complexity for paging is already taken cared of by
KVM mmu. iommu driver just needs to route io page fault to mm core
and handle iotlb invalidation upon EPT change notification from KVM.

from iommufd p.o.v. this special type is managed by an external module
so it doesn't support map/unmap via iommufd then existing map/unmap
path needs no change.