Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
From: Alex Williamson
Date: Tue Sep 28 2021 - 12:27:53 EST
On Tue, 28 Sep 2021 07:43:36 +0000
"Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > Sent: Tuesday, September 28, 2021 3:20 AM
> >
> > On Mon, 27 Sep 2021 13:32:34 +0000
> > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> >
> > > > From: Jason Gunthorpe
> > > > Sent: Monday, September 27, 2021 9:10 PM
> > > >
> > > > On Mon, Sep 27, 2021 at 01:00:08PM +0000, Tian, Kevin wrote:
> > > >
> > > > > > I think for such a narrow usage you should not change the struct
> > > > > > device_driver. Just have pci_stub call a function to flip back to user
> > > > > > mode.
> > > > >
> > > > > Here we want to ensure that kernel dma should be blocked
> > > > > if the group is already marked for user-dma. If we just blindly
> > > > > do it for any driver at this point (as you commented earlier):
> > > > >
> > > > > + ret = iommu_set_kernel_ownership(dev);
> > > > > + if (ret)
> > > > > + return ret;
> > > > >
> > > > > how would pci-stub reach its function to indicate that it doesn't
> > > > > do dma and flip back?
> > > >
> > > > > Do you envision a simpler policy that no driver can be bound
> > > > > to the group if it's already set for user-dma? what about vfio-pci
> > > > > itself?
> > > >
> > > > Yes.. I'm not sure there is a good use case to allow the stub drivers
> > > > to load/unload while a VFIO is running. At least, not a strong enough
> > > > one to justify a global change to the driver core..
> > >
> > > I'm fine with not loading pci-stub. From the very 1st commit msg
> > > looks pci-stub was introduced before vfio to prevent host driver
> > > loading when doing device assignment with KVM. I'm not sure
> > > whether other usages are built on pci-stub later, but in general it's
> > > not good to position devices in a same group into different usages.
> >
> > IIRC, pci-stub was invented for legacy KVM device assignment because
> > KVM was never an actual device driver, it just latched onto and started
> > using the device. If there was an existing driver for the device then
> > KVM would fail to get device resources. Therefore the device needed to
> > be unbound from its standard host driver, but that left it susceptible
> > to driver loads usurping the device. Therefore pci-stub came along to
> > essentially claim the device on behalf of KVM.
> >
> > With vfio, there are a couple use cases of pci-stub that can be
> > interesting. The first is that pci-stub is generally built into the
> > kernel, not as a module, which provides users the ability to specify a
> > list of ids for pci-stub to claim on the kernel command line with
> > higher priority than loadable modules. This can prevent default driver
> > bindings to devices until tools like driverctl or boot time scripting
> > gets a shot to load the user designated driver for a device.
> >
> > The other use case, is that if a group is composed of multiple devices
> > and all those devices are bound to vfio drivers, then the user can gain
> > direct access to each of those devices. If we wanted to insert a
> > barrier to restrict user access to certain devices within a group, we'd
> > suggest binding those devices to pci-stub. Obviously within a group, it
> > may still be possible to manipulate the device via p2p DMA, but the
> > barrier is much higher and device, if not platform, specific to
> > manipulate such devices. An example use case might be a chipset
> > Ethernet controller grouped among system management function in a
> > multi-function root complex integrated endpoint.
>
> Thanks for the background. It perfectly reflects how many tricky things
> that vfio has evolved to deal with and we'll dig them out again in this
> refactoring process with your help. 😊
>
> just a nit on the last example. If a system management function is
> in such group, isn't the right policy is to disallow assigning any device
> in this group? Even the barrier is high, any chance of allowing the guest
> to control a system management function is dangerous...
We can advise that it's a risk, but we generally refrain from making
such policy decisions. Ideally the chipset vendor avoids
configurations that require their users to choose between functionality
and security ;) Thanks,
Alex