Re: [PATCH v4 02/10] PCI: Add additional checks for flr reset
From: Farhan Ali
Date: Tue Sep 30 2025 - 13:04:33 EST
On 9/30/2025 3:03 AM, Benjamin Block wrote:
On Wed, Sep 24, 2025 at 10:16:20AM -0700, Farhan Ali wrote:
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.cJust thinking out loud, not sure whether it make sense, but since you already
index a3d93d1baee7..327fefc6a1eb 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4576,12 +4576,19 @@ EXPORT_SYMBOL_GPL(pcie_flr);
*/
int pcie_reset_flr(struct pci_dev *dev, bool probe)
{
+ u32 reg;
+
if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
return -ENOTTY;
if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
return -ENOTTY;
+ if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, ®)) {
+ pci_warn(dev, "Device unable to do an FLR\n");
+ return -ENOTTY;
+ }
read an up-to-date value from the config space, would it make sense to
pull the check above `dev->devcap & PCI_EXP_DEVCAP_FLR` below this read, and
check on the just read `reg`?
My thinking was we could exit early if the device never had FLR capability (and so was not cached in devcap). This way we avoid an extra PCI read.
I think maybe we could? I don't think this patch fixes anything that's "broken" but rather improves the behavior to escalate to other reset method if the device is already in a bad state. I will cc stable.
Also wondering whether it makes sense to stable-tag this? We've recently seen
"unpleasant" recovery attempts that look like this in the kernel logs:
[ 663.330053] vfio-pci 0007:00:00.1: timed out waiting for pending transaction; performing function level reset anyway
[ 664.730051] vfio-pci 0007:00:00.1: not ready 1023ms after FLR; waiting
[ 665.830023] vfio-pci 0007:00:00.1: not ready 2047ms after FLR; waiting
[ 667.910023] vfio-pci 0007:00:00.1: not ready 4095ms after FLR; waiting
[ 672.070022] vfio-pci 0007:00:00.1: not ready 8191ms after FLR; waiting
[ 680.550025] vfio-pci 0007:00:00.1: not ready 16383ms after FLR; waiting
[ 697.190023] vfio-pci 0007:00:00.1: not ready 32767ms after FLR; waiting
[ 730.470021] vfio-pci 0007:00:00.1: not ready 65535ms after FLR; giving up
The VF here was already dead in the water at that point, so I suspect
`pci_read_config_dword()` might have failed, and so this check would have
failed, and we wouldn't have "wasted" the minute waiting for a FLR that was
never going to happen anyway.
Thanks
Farhan