Re: [RFC] /dev/ioasid uAPI proposal

From: Jason Gunthorpe
Date: Fri May 28 2021 - 16:16:37 EST


On Fri, May 28, 2021 at 06:23:07PM +0200, Jean-Philippe Brucker wrote:

> Regarding the invalidation, I think limiting it to IOASID may work but it
> does bother me that we can't directly forward all invalidations received
> on the vIOMMU: if the guest sends a device-wide invalidation, do we
> iterate over all IOASIDs and issue one ioctl for each? Sure the guest is
> probably sending that because of detaching the PASID table, for which the
> kernel did perform the invalidation, but we can't just assume that and
> ignore the request, there may be a different reason. Iterating is going to
> take a lot time, whereas with the current API we can send a single request
> and issue a single command to the IOMMU hardware.

I think the invalidation could stand some improvement, but that also
feels basically incremental to the essence of the proposal.

I agree with the general goal that the uAPI should be able to issue
invalidates that directly map to HW invalidations.

> Similarly, if the guest sends an ATC invalidation for a whole device (in
> the SMMU, that's an ATC_INV without SSID), we'll have to transform that
> into multiple IOTLB invalidations? We can't just send it on IOASID #0,
> because it may not have been created by the guest.

For instance adding device labels allows an invalidate device
operation to exist and the "generic" kernel driver can iterate over
all IOASIDs hooked to the device. Overridable by the IOMMU driver.

> > Notes:
> > - It might be confusing as IOASID is also used in the kernel (drivers/
> > iommu/ioasid.c) to represent PCI PASID or ARM substream ID. We need
> > find a better name later to differentiate.
>
> Yes this isn't just about allocating PASIDs anymore. /dev/iommu or
> /dev/ioas would make more sense.

Either makes sense to me

/dev/iommu and the internal IOASID objects can be called IOAS (==
iommu_domain) is not bad

> > * Get information about an I/O address space
> > *
> > * Supported capabilities:
> > * - VFIO type1 map/unmap;
> > * - pgtable/pasid_table binding
> > * - hardware nesting vs. software nesting;
> > * - ...
> > *
> > * Related attributes:
> > * - supported page sizes, reserved IOVA ranges (DMA mapping);
> > * - vendor pgtable formats (pgtable binding);
> > * - number of child IOASIDs (nesting);
> > * - ...
> > *
> > * Above information is available only after one or more devices are
> > * attached to the specified IOASID. Otherwise the IOASID is just a
> > * number w/o any capability or attribute.
> > *
> > * Input parameters:
> > * - u32 ioasid;
> > *
> > * Output parameters:
> > * - many. TBD.
>
> We probably need a capability format similar to PCI and VFIO.

Designing this kind of uAPI where it is half HW and half generic is
really tricky to get right. Probably best to take the detailed design
of the IOCTL structs later.

Jason