RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

From: Tian, Kevin
Date: Sat May 08 2021 - 03:31:49 EST


> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Sent: Saturday, May 8, 2021 1:06 AM
>
> > > Those are the main ones I can think of. It is nice to have a simple
> > > map/unmap interface, I'd hope that a new /dev/ioasid interface wouldn't
> > > raise the barrier to entry too high, but the user needs to have the
> > > ability to have more control of their mappings and locked page
> > > accounting should probably be offloaded somewhere. Thanks,
> > >
> >
> > Based on your feedbacks I feel it's probably reasonable to start with
> > a type1v2 semantics for the new interface. Locked accounting could
> > also start with the same VFIO restriction and then improve it
> > incrementally, if a cleaner way is intrusive (if not affecting uAPI).
> > But I didn't get the suggestion on "more control of their mappings".
> > Can you elaborate?
>
> Things like I note above, userspace cannot currently specify mapping
> granularity nor has any visibility to the granularity they get from the
> IOMMU. What actually happens in the IOMMU is pretty opaque to the user
> currently. Thanks,
>

It's much clearer. Based on all the discussions so far I'm thinking about
a staging approach when building the new interface, basically following
the model that Jason pointed out - generic stuff first, then platform
specific extension:

Phase 1: /dev/ioasid with core ingredients and vfio type1v2 semantics
- ioasid is the software handle representing an I/O page table
- uAPI accepts a type1v2 map/unmap semantics per ioasid
- helpers for VFIO/VDPA to bind ioasid_fd and attach ioasids
- multiple ioasids are allowed without nesting (vIOMMU, or devices
w/ incompatible iommu attributes)
- an ioasid disallows any operation before it's attached to a device
- an ioasid inherits iommu attributes from the 1st device attached
to it
- userspace is expected to manage hardware restrictions and the
kernel only returns error when restrictions are broken
* map/unmap on an ioasid will fail before every device in a group
is attached to it
* ioasid attach will fail if the new device has incompatibile iommu
attribute as that of this ioasid
- thus no group semantics in uAPI
- no change to vfio container/group/type1 logic, for running existing
vfio applications
* imply some duplication between vfio type1 and ioasid for some time
- new uAPI in vfio to allow explicit opening of a device and then binding
it to the ioasid_fd
* possibly require each device exposed in /dev/vfio/
- support both pdev and mdev

Phase 2: ioasid nesting
- Allow bind/unbind_pgtable semantics per ioasid
- Allow ioasid nesting
* HW ioasid nesting if supported by platform
* otherwise fall back to SW ioasid nesting (in-kernel shadowing)
- iotlb invalidation per ioasid
- I/O page fault handling per ioasid
- hw_id is not exposed in uAPI. Vendor IOMMU driver decides
when/how hw_id is allocated and programmed properly

Phase3: optimizations and vendor extensions (order undefined, up to
the specific feature owner):
- (Intel) ENQCMD support with hw_id exposure in uAPI
- (ARM/AMD) RID-based pasid table assignment
- (PPC) window-based iova management
- Optimizations:
* replace vfio type1 with a shim driver to use ioasid backend
* mapping granularity
* HW dirty page tracking
* ...

Does above sounds a sensible plan? If yes we'll start working on
phase1 then...

Thanks
Kevin