Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces

From: Alex Williamson
Date: Mon Sep 27 2021 - 15:19:58 EST


On Mon, 27 Sep 2021 13:32:34 +0000
"Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:

> > From: Jason Gunthorpe
> > Sent: Monday, September 27, 2021 9:10 PM
> >
> > On Mon, Sep 27, 2021 at 01:00:08PM +0000, Tian, Kevin wrote:
> >
> > > > I think for such a narrow usage you should not change the struct
> > > > device_driver. Just have pci_stub call a function to flip back to user
> > > > mode.
> > >
> > > Here we want to ensure that kernel dma should be blocked
> > > if the group is already marked for user-dma. If we just blindly
> > > do it for any driver at this point (as you commented earlier):
> > >
> > > + ret = iommu_set_kernel_ownership(dev);
> > > + if (ret)
> > > + return ret;
> > >
> > > how would pci-stub reach its function to indicate that it doesn't
> > > do dma and flip back?
> >
> > > Do you envision a simpler policy that no driver can be bound
> > > to the group if it's already set for user-dma? what about vfio-pci
> > > itself?
> >
> > Yes.. I'm not sure there is a good use case to allow the stub drivers
> > to load/unload while a VFIO is running. At least, not a strong enough
> > one to justify a global change to the driver core..
>
> I'm fine with not loading pci-stub. From the very 1st commit msg
> looks pci-stub was introduced before vfio to prevent host driver
> loading when doing device assignment with KVM. I'm not sure
> whether other usages are built on pci-stub later, but in general it's
> not good to position devices in a same group into different usages.

IIRC, pci-stub was invented for legacy KVM device assignment because
KVM was never an actual device driver, it just latched onto and started
using the device. If there was an existing driver for the device then
KVM would fail to get device resources. Therefore the device needed to
be unbound from its standard host driver, but that left it susceptible
to driver loads usurping the device. Therefore pci-stub came along to
essentially claim the device on behalf of KVM.

With vfio, there are a couple use cases of pci-stub that can be
interesting. The first is that pci-stub is generally built into the
kernel, not as a module, which provides users the ability to specify a
list of ids for pci-stub to claim on the kernel command line with
higher priority than loadable modules. This can prevent default driver
bindings to devices until tools like driverctl or boot time scripting
gets a shot to load the user designated driver for a device.

The other use case, is that if a group is composed of multiple devices
and all those devices are bound to vfio drivers, then the user can gain
direct access to each of those devices. If we wanted to insert a
barrier to restrict user access to certain devices within a group, we'd
suggest binding those devices to pci-stub. Obviously within a group, it
may still be possible to manipulate the device via p2p DMA, but the
barrier is much higher and device, if not platform, specific to
manipulate such devices. An example use case might be a chipset
Ethernet controller grouped among system management function in a
multi-function root complex integrated endpoint.

> but I'm little worried that even vfio-pci itself cannot be bound now,
> which implies that all devices in a group which are intended to be
> used by the user must be bound to vfio-pci in a breath before the
> user attempts to open any of them, i.e. late-binding and device-
> hotplug is disallowed after the initial open. I'm not sure how
> important such an usage would be, but it does cause user-tangible
> semantics change.

Yep, a high potential to break userspace, especially as pci-stub has
been recommended for the cases noted above. I don't expect that tools
like libvirt manage unassigned devices within a group, but that
probably means that there are all sorts of ad-hoc user mechanisms
beyond simply assigning all the devices. Thanks,

Alex