Plan for /dev/ioasid RFC v2

From: Tian, Kevin
Date: Sun Jun 06 2021 - 22:58:29 EST


Hi, all,

We plan to work on v2 now, given many good comments already received
and substantial changes envisioned. This is a very complex topic with
many sub-threads being discussed. To ensure that I didn't miss valuable
suggestions (and also keep everyone on the same page), here I'd like to
provide a list of planned changes in my mind. Please let me know if
anything important is lost. :)

--

(Remaining opens in v1)

- Protocol between kvm/vfio/ioasid for wbinvd/no-snoop. I'll see how
much can be refined based on discussion progress when v2 is out;

- Device-centric (Jason) vs. group-centric (David) uAPI. David is not fully
convinced yet. Based on discussion v2 will continue to have ioasid uAPI
being device-centric (but it's fine for vfio to be group-centric). A new
section will be added to elaborate this part;

- PASID virtualization (section 4) has not been thoroughly discussed yet.
Jason gave some suggestion on how to categorize intended usages.
I will rephrase this section and hope more discussions can be held for
it in v2;

(Adopted suggestions)

- (Jason) Rename /dev/ioasid to /dev/iommu (so does uAPI e.g. IOASID
_XXX to IOMMU_XXX). One suggestion (Jason) was to also rename
RID+PASID to SID+SSID. But given the familiarity of the former, I will
still use RID+PASID in v2 to ease the discussoin;

- (Jason) v1 prevents one device from binding to multiple ioasid_fd's. This
will be fixed in v2;

- (Jean/Jason) No need to track guest I/O page tables on ARM/AMD. When
a pasid table is bound, it becomes a container for all guest I/O page tables;

- (Jean/Jason) Accordingly a device label is required so iotlb invalidation
and fault handling can both support per-device operation. Per Jean's
suggestion, this label will come from userspace (when VFIO_BIND_
IOASID_FD);

- (Jason) Addition of device label allows per-device capability/format
check before IOASIDs are created. This leads to another major uAPI
change in v2 - specify format info when creating an IOASID (mapping
protocol, nesting, coherent, etc.). User is expected to check per-device
format and then set proper format for IOASID upon to-be-attached
device;

- (Jason/David) No restriction on map/unmap vs. bind/invalidate. They
can be used in either parent or child;

- (David) Change IOASID_GET_INFO to report permitted range instead of
reserved IOVA ranges. This works better for PPC;

- (Jason) For helper functions, expect to have explicit bus-type wrappers
e.g. ioasid_pci_device_attach;

(Not adopted)

- (Parav) Make page pinning a syscall;
- (Jason. W/Enrico) one I/O page table per fd;
- (David) Replace IOASID_REGISTER_MEMORY through another ioasid
nesting (sort of passthrough mode). Need more thinking. v2 will not
change this part;

Thanks
Kevin