Re: [PATCH 1/2] PCI: Add VF reset notification to PF's VFIO user mode driver

From: Alex Williamson
Date: Tue Feb 06 2024 - 15:06:46 EST


On Tue, 6 Feb 2024 04:08:18 +0000
"Liu, Monk" <Monk.Liu@xxxxxxx> wrote:

> [AMD Official Use Only - General]
>
> Hi Leon
>
> The thing is when qemu reset a VM it calls vfio’s reset ioctl to the
> given VF device, and in kernel the VFIO-pci module will do the reset
> to that VF device via its PCI config space register, but
> unfortunately our VF GPU isnot designed to support those
> “reset”/”flr” commands … not supported by the VF, (and even many PF
> cannot handle those commands well)

PFs are not required to implement FLR, VFs are.

SR-IOV spec, rev. 1.1:

2.2.2. FLR That Targets a VF

VFs must support Function Level Reset (FLR).

>
> So the idea we can cook up is to move those Vf’s reset notification
> to our PF driver (which is a user mode driver running on PF’s VFIO
> arch), and our user mode driver can program HW and do the reset for
> that VF.

The PF driver being able to arbitrarily reset a VF device provided to
another userspace process doesn't sound like separate, isolated
devices. What else can the PF access?

As noted in my other reply, vf-tokens should not be normalized and
users of VFs provided by third party userspace PF drivers should
consider the device no less tainted than if it were provided by an
out-of-tree kernel driver.

The idea to virtualize FLR on the VF to resolve the hardware defect is a
good one, but it should be done in the context of a vfio-pci variant
driver. Thanks,

Alex


> From: Leon Romanovsky <leon@xxxxxxxxxx>
> Date: Monday, February 5, 2024 at 17:04
> To: Deng, Emily <Emily.Deng@xxxxxxx>
> Cc: bhelgaas@xxxxxxxxxx <bhelgaas@xxxxxxxxxx>,
> alex.williamson@xxxxxxxxxx <alex.williamson@xxxxxxxxxx>,
> linux-pci@xxxxxxxxxxxxxxx <linux-pci@xxxxxxxxxxxxxxx>,
> linux-kernel@xxxxxxxxxxxxxxx <linux-kernel@xxxxxxxxxxxxxxx>,
> kvm@xxxxxxxxxxxxxxx <kvm@xxxxxxxxxxxxxxx>, Jiang, Jerry (SW)
> <Jerry.Jiang@xxxxxxx>, Zhang, Andy <Andy.Zhang@xxxxxxx>, Chang,
> HaiJun <HaiJun.Chang@xxxxxxx>, Liu, Monk <Monk.Liu@xxxxxxx>, Chen,
> Horace <Horace.Chen@xxxxxxx>, Yin, ZhenGuo (Chris)
> <ZhenGuo.Yin@xxxxxxx> Subject: Re: [PATCH 1/2] PCI: Add VF reset
> notification to PF's VFIO user mode driver On Mon, Feb 05, 2024 at
> 03:15:37PM +0800, Emily Deng wrote:
> > VF doesn't have the ability to reset itself completely which will
> > cause the hardware in unstable state. So notify PF driver when the
> > VF has been reset to let the PF resets the VF completely, and
> > remove the VF out of schedule.
>
>
> I'm sorry but this explanation is not different from the previous
> version. Please provide a better explanation of the problem, why it is
> needed, which VFs need and can't reset themselves, how and why it
> worked before e.t.c.
>
> In addition, please follow kernel submission guidelines, write
> changelong, add versions, cover letter e.t.c.
>
> Thanks
>
> >
> > How to implement this?
> > Add the reset callback function in pci_driver
> >
> > Implement the callback functin in VFIO_PCI driver.
> >
> > Add the VF RESET IRQ for user mode driver to let the user mode
> > driver know the VF has been reset.
> >
> > Signed-off-by: Emily Deng <Emily.Deng@xxxxxxx>
> > ---
> > drivers/pci/pci.c | 8 ++++++++
> > include/linux/pci.h | 1 +
> > 2 files changed, 9 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 60230da957e0..aca937b05531 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -4780,6 +4780,14 @@ EXPORT_SYMBOL_GPL(pcie_flr);
> > */
> > int pcie_reset_flr(struct pci_dev *dev, bool probe)
> > {
> > + struct pci_dev *pf_dev;
> > +
> > + if (dev->is_virtfn) {
> > + pf_dev = dev->physfn;
> > + if (pf_dev->driver->sriov_vf_reset_notification)
> > +
> > pf_dev->driver->sriov_vf_reset_notification(pf_dev, dev);
> > + }
> > +
> > if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
> > return -ENOTTY;
> >
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index c69a2cc1f412..4fa31d9b0aa7 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -926,6 +926,7 @@ struct pci_driver {
> > int (*sriov_configure)(struct pci_dev *dev, int num_vfs);
> > /* On PF */ int (*sriov_set_msix_vec_count)(struct pci_dev *vf,
> > int msix_vec_count); /* On PF */ u32
> > (*sriov_get_vf_total_msix)(struct pci_dev *pf);
> > + void (*sriov_vf_reset_notification)(struct pci_dev *pf,
> > struct pci_dev *vf); const struct pci_error_handlers *err_handler;
> > const struct attribute_group **groups;
> > const struct attribute_group **dev_groups;
> > --
> > 2.36.1
> >
> >