Re: [RFC v2] /dev/iommu uAPI proposal

From: Alex Williamson
Date: Tue Jul 13 2021 - 12:26:25 EST


On Tue, 13 Jul 2021 09:55:03 -0300
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Mon, Jul 12, 2021 at 11:56:24PM +0000, Tian, Kevin wrote:
>
> > Maybe I misunderstood your question. Are you specifically worried
> > about establishing the security context for a mdev vs. for its
> > parent?
>
> The way to think about the cookie, and the device bind/attach in
> general, is as taking control of a portion of the IOMMU routing:
>
> - RID
> - RID + PASID
> - "software"
>
> For the first two there can be only one device attachment per value so
> the cookie is unambiguous.
>
> For "software" the iommu layer has little to do with this - everything
> is constructed outside by the mdev. If the mdev wishes to communicate
> on /dev/iommu using the cookie then it has to do so using some iommufd
> api and we can convay the proper device at that point.
>
> Kevin didn't show it, but along side the PCI attaches:
>
> struct iommu_attach_data * iommu_pci_device_attach(
> struct iommu_dev *dev, struct pci_device *pdev,
> u32 ioasid);
>
> There would also be a software attach for mdev:
>
> struct iommu_attach_data * iommu_sw_device_attach(
> struct iommu_dev *dev, struct device *pdev, u32 ioasid);
>
> Which does not connect anything to the iommu layer.
>
> It would have to return something that allows querying the IO page
> table, and the mdev would use that API instead of vfio_pin_pages().


Quoting this proposal again:

> 1) A successful binding call for the first device in the group creates
> the security context for the entire group, by:
>
> * Verifying group viability in a similar way as VFIO does;
>
> * Calling IOMMU-API to move the group into a block-dma state,
> which makes all devices in the group attached to an block-dma
> domain with an empty I/O page table;
>
> VFIO should not allow the user to mmap the MMIO bar of the bound
> device until the binding call succeeds.

The attach step is irrelevant to my question, the bind step is where
the device/group gets into a secure state for device access.

So for IGD we have two scenarios, direct assignment and software mdevs.

AIUI the operation of VFIO_DEVICE_BIND_IOMMU_FD looks like this:

iommu_ctx = iommu_ctx_fdget(iommu_fd);

mdev = mdev_from_dev(vdev->dev);
dev = mdev ? mdev_parent_dev(mdev) : vdev->dev;

iommu_dev = iommu_register_device(iommu_ctx, dev, cookie);

In either case, this last line is either registering the IGD itself
(ie. the struct device representing PCI device 0000:00:02.0) or the
parent of the GVT-g mdev (ie. the struct device representing PCI device
0000:00:02.0). They're the same! AIUI, the cookie is simply an
arbitrary user generated value which they'll use to refer to this
device via the iommu_fd uAPI.

So what magic is iommu_register_device() doing to infer my intentions
as to whether I'm asking for the IGD RID to be isolated or I'm only
creating a software context for an mdev? Thanks,

Alex