Re: Plan for /dev/ioasid RFC v2
From: Alex Williamson
Date: Mon Jun 28 2021 - 18:31:59 EST
On Mon, 28 Jun 2021 01:09:18 +0000
"Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Friday, June 25, 2021 10:36 PM
> >
> > On Fri, Jun 25, 2021 at 10:27:18AM +0000, Tian, Kevin wrote:
> >
> > > - When receiving the binding call for the 1st device in a group, iommu_fd
> > > calls iommu_group_set_block_dma(group, dev->driver) which does
> > > several things:
> >
> > The whole problem here is trying to match this new world where we want
> > devices to be in charge of their own IOMMU configuration and the old
> > world where groups are in charge.
> >
> > Inserting the group fd and then calling a device-centric
> > VFIO_GROUP_GET_DEVICE_FD_NEW doesn't solve this conflict, and isn't
> > necessary.
>
> No, this was not what I meant. There is no group fd required when
> calling this device-centric interface. I was actually talking about:
>
> iommu_group_set_block_dma(dev->group, dev->driver)
>
> just because current iommu layer API is group-centric. Whether this
> should be improved could be next-level thing. Sorry for not making
> it clear in the first place.
>
> > We can always get the group back from the device at any
> > point in the sequence do to a group wide operation.
>
> yes.
>
> >
> > What I saw as the appeal of the sort of idea was to just completely
> > leave all the difficult multi-device-group scenarios behind on the old
> > group centric API and then we don't have to deal with them at all, or
> > least not right away.
>
> yes, this is the staged approach that we discussed earlier. and
> the reason why I refined this proposal about multi-devices group
> here is because you want to see some confidence along this
> direction. Thus I expanded your idea and hope to achieve consensus
> with Alex/Joerg who obviously have not been convinced yet.
>
> >
> > I'd see some progression where iommu_fd only works with 1:1 groups at
> > the start. Other scenarios continue with the old API.
>
> One uAPI open after completing this new sketch. v1 proposed to
> conduct binding (VFIO_BIND_IOMMU_FD) after device_fd is acquired.
> With this sketch we need a new VFIO_GROUP_GET_DEVICE_FD_NEW
> to complete both in one step. I want to get Alex's confirmation whether
> it sounds good to him, since it's better to unify the uAPI between 1:1
> group and 1:N group even if we don't support 1:N in the start.
I don't like it. It doesn't make sense to me. You have the
group-centric world, which must continue to exist and cannot change
because we cannot break the vfio uapi. We can make extensions, we can
define a new parallel uapi, we can deprecate the uapi, but in the short
term, it can't change.
AIUI, the new device-centric model starts with vfio device files that
can be opened directly. So what then is the purpose of a *GROUP* get
device fd? Why is a vfio uapi involved in setting a device cookie for
another subsystem?
I'd expect that /dev/iommu will be used by multiple subsystems. All
will want to bind devices to address spaces, so shouldn't binding a
device to an iommufd be an ioctl on the iommufd, ie.
IOMMU_BIND_VFIO_DEVICE_FD. Maybe we don't even need "VFIO" in there and
the iommufd code can figure it out internally.
You're essentially trying to reduce vfio to the device interface. That
necessarily implies that ioctls on the container, group, or passed
through the container to the iommu no longer exist. From my
perspective, there should ideally be no new vfio ioctls. The user gets
a limited access vfio device fd and enables full access to the device
by registering it to the iommufd subsystem (100% this needs to be
enforced until close() to avoid revoke issues). The user interacts
exclusively with vfio via the device fd and performs all DMA address
space related operations through the iommufd.
> > Then maybe groups where all devices use the same IOASID.
> >
> > Then 1:N groups if the source device is reliably identifiable, this
> > requires iommu subystem work to attach domains to sub-group objects -
> > not sure it is worthwhile.
> >
> > But at least we can talk about each step with well thought out patches
> >
> > The only thing that needs to be done to get the 1:1 step is to broadly
> > define how the other two cases will work so we don't get into trouble
> > and set some way to exclude the problematic cases from even getting to
> > iommu_fd in the first place.
> >
> > For instance if we go ahead and create /dev/vfio/device nodes we could
> > do this only if the group was 1:1, otherwise the group cdev has to be
> > used, along with its API.
>
> I feel for VFIO possibly we don't need significant change to its uAPI
> sequence, since it anyway needs to support existing semantics for
> backward compatibility. With this sketch we can keep vfio container/
> group by introducing an external iommu type which implies a different
> GET_DEVICE_FD semantics. /dev/iommu can report a fd-wide capability
> for whether 1:N group is supported to vfio user.
Ideally vfio would also at least be able to register a type1 IOMMU
backend through the existing uapi, backed by this iommu code, ie. we'd
create a new "iommufd" (but without the user visible fd), bind all the
group devices to it, generating our own device cookies, create a single
ioasid and attach all the devices to it (all internal). When using the
compatibility mode, userspace doesn't get device cookies, doesn't get
an iommufd, they do mappings through the container, where vfio owns the
cookies and ioasid.
> For new subsystems they can directly create device nodes and rely on
> iommu fd to manage group isolation, without introducing any group
> semantics in its uAPI.
Create device nodes, bind them to iommufd, associate cookies, attach
ioasids, etc. That should be the same for all subsystems, including
vfio, it's just the magic internal handshake between the device
subsystem and the iommufd subsystem that changes.
> > > a) Check group viability. A group is viable only when all devices in
> > > the group are in one of below states:
> > >
> > > * driver-less
> > > * bound to a driver which is same as dev->driver (vfio in this case)
> > > * bound to an otherwise allowed driver (same list as in vfio)
> >
> > This really shouldn't use hardwired driver checks. Attached drivers
> > should generically indicate to the iommu layer that they are safe for
> > iommu_fd usage by calling some function around probe()
>
> good idea.
>
> >
> > Thus a group must contain only iommu_fd safe drivers, or drivers-less
> > devices before any of it can be used. It is the more general
> > refactoring of what VFIO is doing.
> >
> > > c) The iommu layer also verifies group viability on BUS_NOTIFY_
> > > BOUND_DRIVER event. BUG_ON if viability is broken while
> > block_dma
> > > is set.
> >
> > And with this concept of iommu_fd safety being first-class maybe we
> > can somehow eliminate this gross BUG_ON (and the 100's of lines of
> > code that are used to create it) by denying probe to non-iommu-safe
> > drivers, somehow.
>
> yes.
>
> >
> > > - Binding other devices in the group to iommu_fd just succeeds since
> > > the group is already in block_dma.
> >
> > I think the rest of this more or less describes the device centric
> > logic for multi-device groups we've already talked about. I don't
> > think it benifits from having the group fd
> >
>
> sure. All of this new sketch doesn't have group fd in any iommu fd
> API. Just try to elaborate a full sketch to sync the base.
>
> Alex/Joerg, look forward to your thoughts now. 😊
Some provided. Thanks,
Alex