RE: [PATCH v2 5/7] vfio/pci: Add sriov_configure support

From: Tian, Kevin
Date: Sun Mar 08 2020 - 21:48:46 EST


> From: Alex Williamson
> Sent: Monday, March 9, 2020 8:46 AM
>
> On Sat, 7 Mar 2020 01:35:23 +0000
> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
>
> > > From: Alex Williamson
> > > Sent: Saturday, March 7, 2020 6:18 AM
> > >
> > > On Fri, 6 Mar 2020 07:57:19 +0000
> > > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> > >
> > > > > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > > > Sent: Friday, March 6, 2020 2:23 AM
> > > > >
> > > > > On Tue, 25 Feb 2020 03:08:00 +0000
> > > > > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> > > > >
> > > > > > > From: Alex Williamson
> > > > > > > Sent: Thursday, February 20, 2020 2:54 AM
> > > > > > >
> > > > > > > With the VF Token interface we can now expect that a vfio
> userspace
> > > > > > > driver must be in collaboration with the PF driver, an unwitting
> > > > > > > userspace driver will not be able to get past the GET_DEVICE_FD
> step
> > > > > > > in accessing the device. We can now move on to actually allowing
> > > > > > > SR-IOV to be enabled by vfio-pci on the PF. Support for this is not
> > > > > > > enabled by default in this commit, but it does provide a module
> > > option
> > > > > > > for this to be enabled (enable_sriov=1). Enabling VFs is rather
> > > > > > > straightforward, except we don't want to risk that a VF might get
> > > > > > > autoprobed and bound to other drivers, so a bus notifier is used
> to
> > > > > > > "capture" VFs to vfio-pci using the driver_override support. We
> > > > > > > assume any later action to bind the device to other drivers is
> > > > > > > condoned by the system admin and allow it with a log warning.
> > > > > > >
> > > > > > > vfio-pci will disable SR-IOV on a PF before releasing the device,
> > > > > > > allowing a VF driver to be assured other drivers cannot take over
> the
> > > > > > > PF and that any other userspace driver must know the shared VF
> > > token.
> > > > > > > This support also does not provide a mechanism for the PF
> userspace
> > > > > > > driver itself to manipulate SR-IOV through the vfio API. With this
> > > > > > > patch SR-IOV can only be enabled via the host sysfs interface and
> the
> > > > > > > PF driver user cannot create or remove VFs.
> > > > > >
> > > > > > I'm not sure how many devices can be properly configured simply
> > > > > > with pci_enable_sriov. It is not unusual to require PF driver prepare
> > > > > > something before turning PCI SR-IOV capability. If you look kernel
> > > > > > PF drivers, there are only two using generic pci_sriov_configure_
> > > > > > simple (simple wrapper like pci_enable_sriov), while most others
> > > > > > implementing their own callback. However vfio itself has no idea
> > > > > > thus I'm not sure how an user knows whether using this option can
> > > > > > actually meet his purpose. I may miss something here, possibly
> > > > > > using DPDK as an example will make it clearer.
> > > > >
> > > > > There is still the entire vfio userspace driver interface. Imagine for
> > > > > example that QEMU emulates the SR-IOV capability and makes a call
> out
> > > > > to libvirt (or maybe runs with privs for the PF SR-IOV sysfs attribs)
> > > > > when the guest enables SR-IOV. Can't we assume that any PF specific
> > > > > support can still be performed in the userspace/guest driver, leaving
> > > > > us with a very simple and generic sriov_configure callback in vfio-pci?
> > > >
> > > > Makes sense. One concern, though, is how an user could be warned
> > > > if he inadvertently uses sysfs to enable SR-IOV on a vfio device whose
> > > > userspace driver is incapable of handling it. Note any VFIO device,
> > > > if SR-IOV capable, will allow user to do so once the module option is
> > > > turned on and the callback is registered. I felt such uncertainty can be
> > > > contained by toggling SR-IOV through a vfio api, but from your
> description
> > > > obviously it is what you want to avoid. Is it due to the sequence reason,
> > > > e.g. that SR-IOV must be enabled before userspace PF driver sets the
> > > > token?
> > >
> > > As in my other reply, enabling SR-IOV via a vfio API suggests that
> > > we're not only granting the user owning the PF device access to the
> > > device itself, but also the ability to create and remove subordinate
> > > devices on the host. That implies an extended degree of trust in the
> > > user beyond the PF device itself and raises questions about whether a
> > > user who is allowed to create VF devices should automatically be
> > > granted access to those VF devices, what the mechanism would be for
> > > that, and how we might re-assign those devices to other users,
> > > potentially including host kernel usage. What I'm proposing here
> > > doesn't preclude some future extension in that direction, but instead
> > > tries to simplify a first step towards enabling SR-IOV by leaving the
> > > SR-IOV enablement and VF assignment in the realm of a privileged system
> > > entity.
> >
> > the intention is clear to me now.
> >
> > >
> > > So, what I think you're suggesting here is that we should restrict
> > > vfio_pci_sriov_configure() to reject enabling SR-IOV until a user
> > > driver has configured a VF token. That requires both that the
> > > userspace driver has initialized to this point before SR-IOV can be
> > > enabled and that we would be forced to define a termination point for
> > > the user set VF token. Logically, this would need to be when the
> > > userspace driver exits or closes the PF device, which implies that we
> > > need to disable SR-IOV on the PF at this point, or we're left in an
> > > inconsistent state where VFs are enabled but cannot be disabled because
> > > we don't have a valid VF token. Now we're back to nearly a state where
> > > the user has control of not creating devices on the host, but removing
> > > them by closing the device, which will necessarily require that any VF
> > > driver release the device, whether userspace or kernel.
> > >
> > > I'm not sure what we're gaining by doing this though. I agree that
> > > there will be users that enable SR-IOV on a PF and then try to, for
> > > example, assign the PF and all the VFs to a VM. The VFs will fail due
> > > to lacking VF token support, unless they've patch QEMU with my test
> > > code, but depending on the PF driver in the guest, it may, or more
> > > likely won't work. But don't you think the VFs and probably PF not
> > > working is a sufficient clue that the configuration is invalid? OTOH,
> > > from what I've heard of the device in the ID table of the pci-pf-stub
> > > driver, they might very well be able to work with both PF and VFs in
> > > QEMU using only my test code to set the VF token.
> > >
> > > Therefore, I'm afraid what you're asking for here is to impose a usage
> > > restriction as a sanity test, when we don't really know what might be
> > > sane for this particular piece of hardware or use case. There are
> > > infinite ways that a vfio based userspace driver can fail to configure
> > > their hardware and make it work correctly, many of them are device
> > > specific. Isn't this just one of those cases? Thanks,
> > >
> >
> > what you said all makes sense. so I withdraw the idea of manipulating
> > SR-IOV through vfio ioctl. However I still feel that simply registering
> > sriov_configuration callback by vfio-pci somehow violates the typical
> > expectation of the sysfs interface. Before this patch, the success return
> > of writing non-zero value to numvfs implies VFs are in sane state and
> > functionally ready for immediate use. However now the behavior of
> > success return becomes undefined for vfio devices, since even vfio-pci
> > itself doesn't know whether VFs are functional for a random device
> > (may know some if carrying the same device IDs from pci-pf-stub). It
> > simply relies on the privileged entity who knows exactly the implication
> > of such write, while there is no way to warn inadvertent users which
> > to me is not a good design from kernel API p.o.v. Of course we may
> > document such restriction and the driver_override may also be an
> > indirect way to warn such user if he wants to use VFs for other purpose.
> > But it is still less elegant than reporting it in the first place. Maybe
> > what we really require is a new sysfs attribute purely for enabling
> > PCI SR-IOV capability, which doesn't imply making VFs actually
> > functional as did through the existing numvfs?
>
> I don't read the same guarantee into the sysfs SR-IOV interface. If
> such a guarantee exists, it's already broken by pci-pf-stub, which like
> vfio-pci allows dynamic IDs and driver_override to bind to any PF device
> allowing the ability to create (potentially) non-functional VFs. I

I don't know whether others raised the similar concern and how
it was addressed for pci-pf-stub before. Many places describe
numvfs as the preferred interface to enable/disable VFs while
'enable' just reads functional to me.

> think it would be a really bad decision to fork a new sysfs interface
> for this. I've already made SR-IOV support in vfio-pci an opt-in via a
> module option, would it ease your concerns if I elaborate in the text
> for the option that enabling SR-IOV may depend on support provided by a
> vfio-pci userspace driver?

Sure.

>
> I think that without absolutely knowing that an operation is incorrect,
> we're just generating noise and confusion by triggering warnings or
> developing alternate interfaces. Unfortunately, we have no generic
> means of knowing that an operation is incorrect, so I assume the best.
> Thanks,
>
> Alex