RE: [RFC v2] /dev/iommu uAPI proposal

From: Tian, Kevin
Date: Wed Aug 04 2021 - 20:37:04 EST


> From: Eric Auger <eric.auger@xxxxxxxxxx>
> Sent: Wednesday, August 4, 2021 11:59 PM
>
[...]
> > 1.2. Attach Device to I/O address space
> > +++++++++++++++++++++++++++++++++++++++
> >
> > Device attach/bind is initiated through passthrough framework uAPI.
> >
> > Device attaching is allowed only after a device is successfully bound to
> > the IOMMU fd. User should provide a device cookie when binding the
> > device through VFIO uAPI. This cookie is used when the user queries
> > device capability/format, issues per-device iotlb invalidation and
> > receives per-device I/O page fault data via IOMMU fd.
> >
> > Successful binding puts the device into a security context which isolates
> > its DMA from the rest system. VFIO should not allow user to access the
> s/from the rest system/from the rest of the system
> > device before binding is completed. Similarly, VFIO should prevent the
> > user from unbinding the device before user access is withdrawn.
> With Intel scalable IOV, I understand you could assign an RID/PASID to
> one VM and another one to another VM (which is not the case for ARM). Is
> it a targetted use case?How would it be handled? Is it related to the
> sub-groups evoked hereafter?

Not related to sub-group. Each mdev is bound to the IOMMU fd respectively
with the defPASID which represents the mdev.

>
> Actually all devices bound to an IOMMU fd should have the same parent
> I/O address space or root address space, am I correct? If so, maybe add
> this comment explicitly?

in most cases yes but it's not mandatory. multiple roots are allowed
(e.g. with vIOMMU but no nesting).

[...]
> > The device in the /dev/iommu context always refers to a physical one
> > (pdev) which is identifiable via RID. Physically each pdev can support
> > one default I/O address space (routed via RID) and optionally multiple
> > non-default I/O address spaces (via RID+PASID).
> >
> > The device in VFIO context is a logic concept, being either a physical
> > device (pdev) or mediated device (mdev or subdev). Each vfio device
> > is represented by RID+cookie in IOMMU fd. User is allowed to create
> > one default I/O address space (routed by vRID from user p.o.v) per
> > each vfio_device.
> The concept of default address space is not fully clear for me. I
> currently understand this is a
> root address space (not nesting). Is that coorect.This may need
> clarification.

w/o PASID there is only one address space (either GPA or GIOVA)
per device. This one is called default. whether it's root is orthogonal
(e.g. GIOVA could be also nested) to the device view of this space.

w/ PASID additional address spaces can be targeted by the device.
those are called non-default.

I could also rename default to RID address space and non-default to
RID+PASID address space if doing so makes it clearer.

> > VFIO decides the routing information for this default
> > space based on device type:
> >
> > 1) pdev, routed via RID;
> >
> > 2) mdev/subdev with IOMMU-enforced DMA isolation, routed via
> > the parent's RID plus the PASID marking this mdev;
> >
> > 3) a purely sw-mediated device (sw mdev), no routing required i.e. no
> > need to install the I/O page table in the IOMMU. sw mdev just uses
> > the metadata to assist its internal DMA isolation logic on top of
> > the parent's IOMMU page table;
> Maybe you should introduce this concept of SW mediated device earlier
> because it seems to special case the way the attach behaves. I am
> especially refering to
>
> "Successful attaching activates an I/O address space in the IOMMU, if the
> device is not purely software mediated"

makes sense.

>
> >
> > In addition, VFIO may allow user to create additional I/O address spaces
> > on a vfio_device based on the hardware capability. In such case the user
> > has its own view of the virtual routing information (vPASID) when marking
> > these non-default address spaces.
> I do not catch what does mean "marking these non default address space".

as explained above, those non-default address spaces are identified/routed
via PASID.

> >
> > 1.3. Group isolation
> > ++++++++++++++++++++
[...]
> >
> > 1) A successful binding call for the first device in the group creates
> > the security context for the entire group, by:
> >
> > * Verifying group viability in a similar way as VFIO does;
> >
> > * Calling IOMMU-API to move the group into a block-dma state,
> > which makes all devices in the group attached to an block-dma
> > domain with an empty I/O page table;
> this block-dma state/domain would deserve to be better defined (I know
> you already evoked it in 1.1 with the dma mapping protocol though)
> activates an empty I/O page table in the IOMMU (if the device is not
> purely SW mediated)?

sure. some explanations are scattered in following paragraph, but I
can consider to further clarify it.

> How does that relate to the default address space? Is it the same?

different. this block-dma domain doesn't hold any valid mapping. The
default address space is represented by a normal unmanaged domain.
the ioasid attaching operation will detach the device from the block-dma
domain and then attach it to the target ioasid.

> >
> > 2. uAPI Proposal
> > ----------------------
[...]
> > /*
> > * Allocate an IOASID.
> > *
> > * IOASID is the FD-local software handle representing an I/O address
> > * space. Each IOASID is associated with a single I/O page table. User
> > * must call this ioctl to get an IOASID for every I/O address space that is
> > * intended to be tracked by the kernel.
> > *
> > * User needs to specify the attributes of the IOASID and associated
> > * I/O page table format information according to one or multiple devices
> > * which will be attached to this IOASID right after. The I/O page table
> > * is activated in the IOMMU when it's attached by a device. Incompatible
>
> .. if not SW mediated
> > * format between device and IOASID will lead to attaching failure.
> > *
> > * The root IOASID should always have a kernel-managed I/O page
> > * table for safety. Locked page accounting is also conducted on the root.
> The definition of root IOASID is not easily found in this spec. Maybe
> this would deserve some clarification.

make sense.

and thanks for other typo-related comments.

Thanks
Kevin