[PATCH v2] PCI: Prevent workqueue code nesting in local_pci_probe()

From: Waiman Long

Date: Tue May 26 2026 - 09:42:25 EST


local_pci_probe() and hence pci_call_probe() can be called
recursively. If the recursive calls are done indirectly via workqueue
kworker, a lockdep recursive warning can be produced.

There are older commits that tries to prevent that. One example is
commit 12c3156f10c5 ("PCI: Avoid unnecessary CPU switch when calling
driver .probe() method") which prevents work_on_cpu() recursion when
the current device is a virtual function and the physical device has
been probed. However, there are still other cases where workqueue code
nesting is possible leading to a lockdep recursive locking warning like
the following stack trace on a 4-socket x86-64 Skylake server.

<TASK>
:
start_flush_work+0x40b/0x9b0
__flush_work+0xbd/0x1a0
pci_call_probe+0x510/0x700
pci_device_probe+0x17c/0x270
call_driver_probe+0x68/0x1f0
really_probe+0x197/0x7b0
__driver_probe_device+0x32d/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x162/0x290
bus_for_each_drv+0x109/0x190
__device_attach+0x1a2/0x3f0
device_initial_probe+0x7d/0xa0
pci_bus_add_device+0x93/0xe0
pci_bus_add_devices+0x83/0x190
vmd_enable_domain+0x11fb/0x1b80
vmd_probe+0x34c/0x4b0
local_pci_probe+0xdf/0x190
local_pci_probe_callback+0x35/0x80
process_one_work+0x919/0x1af0
worker_thread+0x5a6/0xd10
:
</TASK>

Fix that by adding a new wq_kworker() helper to check if the current
task is a workqueue kworker. If so, call local_pci_probe() directly
instead of calling into workqueue code recursively. However, this patch
will increase the level of local_pci_probe() nesting that is possible.

For this particular system with a patched kernel built with a RHEL
based kernel config file, the local_pci_probe() nesting is only 2
levels deep. The additional stack usage from one local_pci_probe()
to the next is measured to be 792 bytes. With a 16k kernel stack,
it can sustain up to 20 local_pci_probe() nesting loop. The mileage
may vary depending on what kernel options are enabled and future code
changes in the PCI driver code base.

It is unlikely that we will see a real system with a very deeply nested
PCIe topology requiring more levels of nesting than is supportable by
a 16k kernel stack. To be cautious, a comment is added to mention that
kernel stack exhaustion can be a possibility if a very deeply nested
PCIe topology is to be supported as further change to how PCI device
probing works may be needed.

Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
drivers/pci/pci-driver.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

[v2] Directly check PF_WQ_WORKER as suggested by Sashiko and document
the kernel stack exhaustion issue with deeply nested PCIe topology.

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index e3f59001785a..6d9944b42c3c 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -370,6 +370,14 @@ static bool pci_physfn_is_probed(struct pci_dev *dev)
#endif
}

+/*
+ * Return true if current task is a workqueue kworker
+ */
+static bool wq_kworker(void)
+{
+ return current->flags & PF_WQ_WORKER;
+}
+
static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
const struct pci_device_id *id)
{
@@ -387,10 +395,17 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
cpu_hotplug_disable();
/*
* Prevent nesting work_on_cpu() for the case where a Virtual Function
- * device is probed from work_on_cpu() of the Physical device.
+ * device is probed from work_on_cpu() of the Physical device or when
+ * the current task is a workqueue kworker.
+ *
+ * TODO: With a deeply nested PCIe topology, pci_call_probe() can be
+ * recursively called multiple times. If the nesting is deep enough,
+ * it may cause exhaustion of the kernel stack. So some additional
+ * changes will be needed if such a deeply nested topology is to be
+ * supported.
*/
if (node < 0 || node >= MAX_NUMNODES || !node_online(node) ||
- pci_physfn_is_probed(dev)) {
+ pci_physfn_is_probed(dev) || wq_kworker()) {
error = local_pci_probe(&ddi);
} else {
struct pci_probe_arg arg = { .ddi = &ddi };
--
2.54.0