Re: [PATCH v16 0/9] PCI: Expose and manage PCI device reset

From: Bjorn Helgaas
Date: Wed Aug 18 2021 - 18:46:18 EST


On Tue, Aug 17, 2021 at 11:39:28PM +0530, Amey Narkhede wrote:
> PCI and PCIe devices may support a number of possible reset mechanisms
> for example Function Level Reset (FLR) provided via Advanced Feature or
> PCIe capabilities, Power Management reset, bus reset, or device specific reset.
> Currently the PCI subsystem creates a policy prioritizing these reset methods
> which provides neither visibility nor control to userspace.
>
> Expose the reset methods available per device to userspace, via sysfs
> and allow an administrative user or device owner to have ability to
> manage per device reset method priorities or exclusions.
> This feature aims to allow greater control of a device for use cases
> as device assignment, where specific device or platform issues may
> interact poorly with a given reset method, and for which device specific
> quirks have not been developed.
>
> Changes in v16:
> - Refactor acpi_pci_bridge_d3() in patch 7/9
> - Fixed consistency issues in patch 9/9
>
> Changes in v15:
> - Fix use of uninitialized variable in patch 3/9
>
> Changes in v14:
> - Remove duplicate entries from pdev->reset_methods as per
> Shanker's suggestion
>
> Changes in v13:
> - Added "PCI: Cache PCIe FLR capability"
> - Removed memcpy in pci_init_reset_methods() and reset_method_show
> - Moved reset_method sysfs attribute code from pci-sysfs.c to
> pci.c
>
> Changes in v12:
> - Corrected subject in 0/8 (cover letter).
>
> Changes in v11:
> - Alex's suggestion fallback back to other resets if the ACPI RST
> fails. Fix "s/-EINVAL/-ENOTTY/" in 7/8 patch.
>
> Changes in v10:
> - Fix build error on ppc as reported by build bot
>
> Changes in v9:
> - Renamed has_flr bitfield to has_pcie_flr and restored
> use of PCI_DEV_FLAGS_NO_FLR_RESET in quirk_no_flr()
> - Cleaned up sysfs code
>
> Changes in v8:
> - Added has_flr bitfield to struct pci_dev to cache flr
> capability
> - Updated encoding scheme used in reset_methods array as per
> Bjorn's suggestion
> - Updated Shanker's ACPI patches
>
> Changes in v7:
> - Fix the pci_dev_acpi_reset() prototype mismatch
> in case of CONFIG_ACPI=n
>
> Changes in v6:
> - Address Bjorn's and Krzysztof's review comments
> - Add Shanker's updated patches along with new
> "PCI: Setup ACPI_COMPANION early" patch
>
> Changes in v5:
> - Rebase the series over pci/reset branch of
> Bjorn's pci tree to avoid merge conflicts
> caused by recent changes in existing reset
> sysfs attribute
>
> Changes in v4:
> - Change the order or strlen and strim in reset_method_store
> function to avoid extra strlen call.
> - Use consistent terminology in new
> pci_reset_mode enum and rename the probe argument
> of reset functions.
>
> Changes in v3:
> - Dropped "PCI: merge slot and bus reset implementations" which was
> already accepted separately
> - Grammar fixes
> - Added Shanker's patches which were rebased on v2 of this series
> - Added "PCI: Change the type of probe argument in reset functions"
> and additional user input sanitization code in reset_method_store
> function per review feedback from Krzysztof
>
> Changes in v2:
> - Use byte array instead of bitmap to keep track of
> ordering of reset methods
> - Fix incorrect use of reset_fn field in octeon driver
> - Allow writing comma separated list of names of supported reset
> methods to reset_method sysfs attribute
> - Writing empty string instead of "none" to reset_method attribute
> disables ability of reset the device
>
> Amey Narkhede (6):
> PCI: Cache PCIe FLR capability
> PCI: Add pcie_reset_flr to follow calling convention of other reset
> methods
> PCI: Add new array for keeping track of ordering of reset methods
> PCI: Remove reset_fn field from pci_dev
> PCI: Allow userspace to query and set device reset mechanism
> PCI: Change the type of probe argument in reset functions
>
> Shanker Donthineni (3):
> PCI: Define a function to set ACPI_COMPANION in pci_dev
> PCI: Setup ACPI fwnode early and at the same time with OF
> PCI: Add support for ACPI _RST reset method
>
> Documentation/ABI/testing/sysfs-bus-pci | 19 ++
> drivers/crypto/cavium/nitrox/nitrox_main.c | 4 +-
> .../ethernet/cavium/liquidio/lio_vf_main.c | 2 +-
> drivers/pci/hotplug/pciehp.h | 2 +-
> drivers/pci/hotplug/pciehp_hpc.c | 2 +-
> drivers/pci/hotplug/pnv_php.c | 2 +-
> drivers/pci/pci-acpi.c | 83 +++---
> drivers/pci/pci-sysfs.c | 3 +-
> drivers/pci/pci.c | 279 +++++++++++++-----
> drivers/pci/pci.h | 24 +-
> drivers/pci/pcie/aer.c | 12 +-
> drivers/pci/probe.c | 16 +-
> drivers/pci/quirks.c | 25 +-
> drivers/pci/remove.c | 1 -
> include/linux/pci.h | 14 +-
> include/linux/pci_hotplug.h | 2 +-
> 16 files changed, 346 insertions(+), 144 deletions(-)

I applied these to pci/reset for v5.15, thanks!

Of course, I made some edits, mostly trivial, but not all, so please
take a look and see if I broke something.

The biggest changes are to reset_method_store(), where I made it
return -EINVAL if the user-supplied string contains an invalid method,
a method whose probe call says it's unsupported, or too many methods.
In all these cases, the previous reset_methods[] array is unchanged.

I think this is basically what you originally proposed, Amey, and I
thought it was too complicated. But Krzysztof convinced me that
silently ignoring bad data from the user makes the interface hard to
use.

Below is the whole diff from the v16 you posted to what's on the
pci/reset branch. I'm happy to update that branch before it gets
merged into v5.15.

Bjorn

diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index aefb25e7c8d0..d4ae03296861 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -122,23 +122,21 @@ Description:
from this part of the device tree.

What: /sys/bus/pci/devices/.../reset_method
-Date: March 2021
+Date: August 2021
Contact: Amey Narkhede <ameynarkhede03@xxxxxxxxx>
Description:
Some devices allow an individual function to be reset
without affecting other functions in the same slot.

For devices that have this support, a file named
- reset_method will be present in sysfs. Initially reading
- this file will give names of the device supported reset
- methods and their ordering. After write, this file will
- give names and ordering of currently enabled reset methods.
- Writing the name or space separated list of names of any of
- the device supported reset methods to this file will set
- the reset methods and their ordering to be used when
- resetting the device. Writing empty string to this file
- will disable ability to reset the device and writing
- "default" will return to the original value.
+ reset_method is present in sysfs. Reading this file
+ gives names of the supported and enabled reset methods and
+ their ordering. Writing a space-separated list of names of
+ reset methods sets the reset methods and ordering to be
+ used when resetting the device. Writing an empty string
+ disables the ability to reset the device. Writing
+ "default" enables all supported reset methods in the
+ default ordering.

What: /sys/bus/pci/devices/.../reset
Date: July 2009
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index 4c17a5dc26cf..f4c2e6e01be0 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -537,7 +537,7 @@ static int pnv_php_reset_slot(struct hotplug_slot *slot, bool probe)
* which don't have a bridge. Only claim to support
* reset_slot() if we have a bridge device (for now...)
*/
- if (probe == PCI_RESET_PROBE)
+ if (probe)
return !bridge;

/* mask our interrupt while resetting the bridge */
diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index 968bf8aa5f15..fe286c861187 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -944,8 +944,7 @@ void pci_set_acpi_fwnode(struct pci_dev *dev)
/**
* pci_dev_acpi_reset - do a function level reset using _RST method
* @dev: device to reset
- * @probe: If PCI_RESET_PROBE, check whether _RST method is included
- * in the acpi_device context.
+ * @probe: if true, return 0 if device supports _RST
*/
int pci_dev_acpi_reset(struct pci_dev *dev, bool probe)
{
@@ -968,7 +967,10 @@ int pci_dev_acpi_reset(struct pci_dev *dev, bool probe)
static bool acpi_pci_power_manageable(struct pci_dev *dev)
{
struct acpi_device *adev = ACPI_COMPANION(&dev->dev);
- return adev ? acpi_device_power_manageable(adev) : false;
+
+ if (!adev)
+ return false;
+ return acpi_device_power_manageable(adev);
}

static bool acpi_pci_bridge_d3(struct pci_dev *dev)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index c76451bfeb89..b87bac5e4572 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4627,21 +4627,6 @@ int pci_wait_for_pending_transaction(struct pci_dev *dev)
}
EXPORT_SYMBOL(pci_wait_for_pending_transaction);

-/**
- * pcie_has_flr - check if a device supports function level resets
- * @dev: device to check
- *
- * Returns true if the device advertises support for PCIe function level
- * resets.
- */
-static bool pcie_has_flr(struct pci_dev *dev)
-{
- if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
- return false;
-
- return FIELD_GET(PCI_EXP_DEVCAP_FLR, dev->devcap) == 1;
-}
-
/**
* pcie_flr - initiate a PCIe function level reset
* @dev: device to reset
@@ -4673,13 +4658,16 @@ EXPORT_SYMBOL_GPL(pcie_flr);
/**
* pcie_reset_flr - initiate a PCIe function level reset
* @dev: device to reset
- * @probe: If PCI_RESET_PROBE, only check if the device can be reset this way.
+ * @probe: if true, return 0 if device can be reset this way
*
* Initiate a function level reset on @dev.
*/
int pcie_reset_flr(struct pci_dev *dev, bool probe)
{
- if (!pcie_has_flr(dev))
+ if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
+ return -ENOTTY;
+
+ if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
return -ENOTTY;

if (probe)
@@ -4736,7 +4724,7 @@ static int pci_af_flr(struct pci_dev *dev, bool probe)
/**
* pci_pm_reset - Put device into PCI_D3 and back into PCI_D0.
* @dev: Device to reset.
- * @probe: If PCI_RESET_PROBE, only check if the device can be reset this way.
+ * @probe: if true, return 0 if the device can be reset this way.
*
* If @dev supports native PCI PM and its PCI_PM_CTRL_NO_SOFT_RESET flag is
* unset, it will be reinitialized internally when going from PCI_D3hot to
@@ -4759,7 +4747,7 @@ static int pci_pm_reset(struct pci_dev *dev, bool probe)
if (csr & PCI_PM_CTRL_NO_SOFT_RESET)
return -ENOTTY;

- if (probe == PCI_RESET_PROBE)
+ if (probe)
return 0;

if (dev->current_state != PCI_D0)
@@ -5167,19 +5155,31 @@ static ssize_t reset_method_show(struct device *dev,
return len;
}

+static int reset_method_lookup(const char *name)
+{
+ int m;
+
+ for (m = 1; m < PCI_NUM_RESET_METHODS; m++) {
+ if (sysfs_streq(name, pci_reset_fn_methods[m].name))
+ return m;
+ }
+
+ return 0; /* not found */
+}
+
static ssize_t reset_method_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
struct pci_dev *pdev = to_pci_dev(dev);
- int i, m = 0, n = 0;
- char *name, *options;
-
- if (count >= (PAGE_SIZE - 1))
- return -EINVAL;
+ char *options, *name;
+ int m, n;
+ u8 reset_methods[PCI_NUM_RESET_METHODS] = { 0 };

if (sysfs_streq(buf, "")) {
- goto exit;
+ pdev->reset_methods[0] = 0;
+ pci_warn(pdev, "All device reset methods disabled by user");
+ return count;
}

if (sysfs_streq(buf, "default")) {
@@ -5191,53 +5191,46 @@ static ssize_t reset_method_store(struct device *dev,
if (!options)
return -ENOMEM;

+ n = 0;
while ((name = strsep(&options, " ")) != NULL) {
if (sysfs_streq(name, ""))
continue;

name = strim(name);

- for (m = 1; m < PCI_NUM_RESET_METHODS; m++) {
- if (sysfs_streq(name, pci_reset_fn_methods[m].name))
- break;
+ m = reset_method_lookup(name);
+ if (!m) {
+ pci_err(pdev, "Invalid reset method '%s'", name);
+ goto error;
}

- if (m == PCI_NUM_RESET_METHODS) {
- pci_warn(pdev, "Skip invalid reset method '%s'", name);
- continue;
- }
-
- for (i = 0; i < n; i++) {
- if (pdev->reset_methods[i] == m)
- break;
- }
-
- if (i < n)
- continue;
-
if (pci_reset_fn_methods[m].reset_fn(pdev, PCI_RESET_PROBE)) {
- pci_warn(pdev, "Unsupported reset method '%s'", name);
- continue;
+ pci_err(pdev, "Unsupported reset method '%s'", name);
+ goto error;
}

- pdev->reset_methods[n++] = m;
- BUG_ON(n == PCI_NUM_RESET_METHODS);
+ if (n == PCI_NUM_RESET_METHODS - 1) {
+ pci_err(pdev, "Too many reset methods\n");
+ goto error;
+ }
+
+ reset_methods[n++] = m;
}

+ reset_methods[n] = 0;
+
+ /* Warn if dev-specific supported but not highest priority */
+ if (pci_reset_fn_methods[1].reset_fn(pdev, PCI_RESET_PROBE) == 0 &&
+ reset_methods[0] != 1)
+ pci_warn(pdev, "Device-specific reset disabled/de-prioritized by user");
+ memcpy(pdev->reset_methods, reset_methods, sizeof(pdev->reset_methods));
kfree(options);
-
-exit:
- /* All the reset methods are invalid */
- if (n == 0 && m == PCI_NUM_RESET_METHODS)
- return -EINVAL;
- pdev->reset_methods[n] = 0;
- if (pdev->reset_methods[0] == 0) {
- pci_warn(pdev, "All device reset methods disabled by user");
- } else if ((pdev->reset_methods[0] != 1) &&
- !pci_reset_fn_methods[1].reset_fn(pdev, PCI_RESET_PROBE)) {
- pci_warn(pdev, "Device specific reset disabled/de-prioritized by user");
- }
return count;
+
+error:
+ /* Leave previous methods unchanged */
+ kfree(options);
+ return -EINVAL;
}
static DEVICE_ATTR_RW(reset_method);

@@ -5296,7 +5289,7 @@ int __pci_reset_function_locked(struct pci_dev *dev)
* error, we're also finished: this indicates that further reset
* mechanisms might be broken on the device.
*/
- for (i = 0; i < PCI_NUM_RESET_METHODS; i++) {
+ for (i = 0; i < PCI_NUM_RESET_METHODS; i++) {
m = dev->reset_methods[i];
if (!m)
return -ENOTTY;
@@ -5333,7 +5326,6 @@ void pci_init_reset_methods(struct pci_dev *dev)
might_sleep();

i = 0;
-
for (m = 1; m < PCI_NUM_RESET_METHODS; m++) {
rc = pci_reset_fn_methods[m].reset_fn(dev, PCI_RESET_PROBE);
if (!rc)
@@ -5659,14 +5651,14 @@ static int pci_slot_reset(struct pci_slot *slot, bool probe)
if (!slot || !pci_slot_resetable(slot))
return -ENOTTY;

- if (probe != PCI_RESET_PROBE)
+ if (!probe)
pci_slot_lock(slot);

might_sleep();

rc = pci_reset_hotplug_slot(slot->hotplug, probe);

- if (probe != PCI_RESET_PROBE)
+ if (!probe)
pci_slot_unlock(slot);

return rc;
@@ -5726,7 +5718,7 @@ static int pci_bus_reset(struct pci_bus *bus, bool probe)
if (!bus->self || !pci_bus_resetable(bus))
return -ENOTTY;

- if (probe == PCI_RESET_PROBE)
+ if (probe)
return 0;

pci_bus_lock(bus);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 38db12d05ca0..a46363f29b68 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -52,7 +52,7 @@
/* Number of reset methods used in pci_reset_fn_methods array in pci.c */
#define PCI_NUM_RESET_METHODS 7

-#define PCI_RESET_PROBE true
+#define PCI_RESET_PROBE true
#define PCI_RESET_DO_RESET false

/*
@@ -339,7 +339,7 @@ struct pci_dev {
struct rcec_ea *rcec_ea; /* RCEC cached endpoint association */
struct pci_dev *rcec; /* Associated RCEC device */
#endif
- u32 devcap; /* PCIe device capabilities */
+ u32 devcap; /* PCIe Device Capabilities */
u8 pcie_cap; /* PCIe capability offset */
u8 msi_cap; /* MSI capability offset */
u8 msix_cap; /* MSI-X capability offset */
@@ -511,10 +511,9 @@ struct pci_dev {
char *driver_override; /* Driver name to force a match */

unsigned long priv_flags; /* Private flags for the PCI driver */
- /*
- * See pci_reset_fn_methods array in pci.c for ordering.
- */
- u8 reset_methods[PCI_NUM_RESET_METHODS]; /* Reset methods ordered by priority */
+
+ /* These methods index pci_reset_fn_methods[] */
+ u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
};

static inline struct pci_dev *pci_physfn(struct pci_dev *dev)