RE: [RFC] /dev/ioasid uAPI proposal

From: Tian, Kevin
Date: Fri Jun 04 2021 - 02:08:38 EST


> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Thursday, June 3, 2021 8:11 PM
>
> On Thu, Jun 03, 2021 at 03:45:09PM +1000, David Gibson wrote:
> > On Wed, Jun 02, 2021 at 01:58:38PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Jun 02, 2021 at 04:48:35PM +1000, David Gibson wrote:
> > > > > > /* Bind guest I/O page table */
> > > > > > bind_data = {
> > > > > > .ioasid = gva_ioasid;
> > > > > > .addr = gva_pgtable1;
> > > > > > // and format information
> > > > > > };
> > > > > > ioctl(ioasid_fd, IOASID_BIND_PGTABLE, &bind_data);
> > > > >
> > > > > Again I do wonder if this should just be part of alloc_ioasid. Is
> > > > > there any reason to split these things? The only advantage to the
> > > > > split is the device is known, but the device shouldn't impact
> > > > > anything..
> > > >
> > > > I'm pretty sure the device(s) could matter, although they probably
> > > > won't usually.
> > >
> > > It is a bit subtle, but the /dev/iommu fd itself is connected to the
> > > devices first. This prevents wildly incompatible devices from being
> > > joined together, and allows some "get info" to report the capability
> > > union of all devices if we want to do that.
> >
> > Right.. but I've not been convinced that having a /dev/iommu fd
> > instance be the boundary for these types of things actually makes
> > sense. For example if we were doing the preregistration thing
> > (whether by child ASes or otherwise) then that still makes sense
> > across wildly different devices, but we couldn't share that layer if
> > we have to open different instances for each of them.
>
> It is something that still seems up in the air.. What seems clear for
> /dev/iommu is that it
> - holds a bunch of IOASID's organized into a tree
> - holds a bunch of connected devices
> - holds a pinned memory cache
>
> One thing it must do is enforce IOMMU group security. A device cannot
> be attached to an IOASID unless all devices in its IOMMU group are
> part of the same /dev/iommu FD.
>
> The big open question is what parameters govern allowing devices to
> connect to the /dev/iommu:
> - all devices can connect and we model the differences inside the API
> somehow.

I prefer to this option if no significant block ahead.

> - Only sufficiently "similar" devices can be connected
> - The FD's capability is the minimum of all the connected devices
>
> There are some practical problems here, when an IOASID is created the
> kernel does need to allocate a page table for it, and that has to be
> in some definite format.
>
> It may be that we had a false start thinking the FD container should
> be limited. Perhaps creating an IOASID should pass in a list
> of the "device labels" that the IOASID will be used with and that can
> guide the kernel what to do?

In Qemu case the problem is that it doesn't know the list of devices
that will be attached to an IOASID when it's created. This is a guest-
side knowledge which is conveyed one device at a time to Qemu
though vIOMMU.

I feel it's fair to say that before user wants to create an IOASID he
should already check the format information about the device which
is intended to be attached right after then when creating the IOASID
the user should specify a format compatible to the device. There is
format check when IOASID is created, since its I/O page table is not
installed to the IOMMU yet. Later when the intended device is attached
to this IOASID, then verify the format and fail the attach request if
incompatible.

Thanks
Kevin