RE: [RFC 03/20] vfio: Add vfio_[un]register_device()

From: Tian, Kevin
Date: Wed Sep 22 2021 - 19:46:52 EST


> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Sent: Thursday, September 23, 2021 6:45 AM
>
> On Wed, 22 Sep 2021 22:34:42 +0000
> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
>
> > > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > Sent: Thursday, September 23, 2021 4:11 AM
> > >
> > > On Wed, 22 Sep 2021 09:22:52 -0300
> > > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> > >
> > > > On Wed, Sep 22, 2021 at 09:23:34AM +0000, Tian, Kevin wrote:
> > > >
> > > > > > Providing an ioctl to bind to a normal VFIO container or group might
> > > > > > allow a reasonable fallback in userspace..
> > > > >
> > > > > I didn't get this point though. An error in binding already allows the
> > > > > user to fall back to the group path. Why do we need introduce
> another
> > > > > ioctl to explicitly bind to container via the nongroup interface?
> > > >
> > > > New userspace still needs a fallback path if it hits the 'try and
> > > > fail'. Keeping the device FD open and just using a different ioctl to
> > > > bind to a container/group FD, which new userspace can then obtain as
> a
> > > > fallback, might be OK.
> > > >
> > > > Hard to see without going through the qemu parts, so maybe just keep
> > > > it in mind
> > >
> > > If we assume that the container/group/device interface is essentially
> > > deprecated once we have iommufd, it doesn't make a lot of sense to me
> > > to tack on a container/device interface just so userspace can avoid
> > > reverting to the fully legacy interface.
> > >
> > > But why would we create vfio device interface files at all if they
> > > can't work? I'm not really on board with creating a try-and-fail
> > > interface for a mechanism that cannot work for a given device. The
> > > existence of the device interface should indicate that it's supported.
> > > Thanks,
> > >
> >
> > Now it's a try-and-fail model even for devices which support iommufd.
> > Per Jason's suggestion, a device is always opened with a parked fops
> > which supports only bind. Binding serves as the contract for handling
> > exclusive ownership on a device and switching to normal fops if
> > succeed. So the user has to try-and-fail in case multiple threads attempt
> > to open a same device. Device which doesn't support iommufd is not
> > different, except binding request 100% fails (due to missing .bind_iommufd
> > in kernel driver).
>
> That's a rather important difference. I don't really see how that's
> comparable to the mutually exclusive nature of the legacy vs device

I didn't get the 'comparable' part. Can you elaborate?

> interface. We're not going to present a vfio device interface for SW
> mdevs that can't participate in iommufd, right? Thanks,
>

Did you see any problem if exposing sw mdev now? Following above
explanation the try-and-fail model should still work...

btw I realized another related piece regarding to the new layout that
Jason suggested, which have sys device node include a link to the vfio
devnode:

/sys/bus/pci/devices/DDDD:BB:DD.F/vfio/vfioX/dev

This for sure requires specific vfio driver support to get the link established.
if we only do it for vfio-pci in the start, then for other devices which don't
support iommufd there is no way for the user to identify the corresponding
vfio devnode even it's still exposed. Then try-and-fail model may not even
been reached for those devices.

Thanks
Kevin