Re: [PATCH] PCI: Fix AB-BA deadlock between aer_isr() and device_shutdown()

From: duziming

Date: Wed Jan 14 2026 - 21:50:33 EST



在 2026/1/14 2:51, Bjorn Helgaas 写道:
On Fri, Jan 09, 2026 at 05:56:03PM +0800, Ziming Du wrote:
During system shutdown, a deadlock may occur between AER recovery process
and device shutdown as follows:

The device_shutdown path holds the device_lock throughout the entire
process and waits for the irq handlers to complete when release nodes:

device_shutdown
device_lock # A hold device_lock
pci_device_shutdown
pcie_port_device_remove
remove_iter
device_unregister
device_del
bus_remove_device
device_release_driver
devres_release_all
release_nodes # B wait for irq handlers
Can you add the wait location to these example? release_nodes()
doesn't wait itself, so I guess it must be in a dr->node.release()
function?

And I guess it must be related to something in the IRQ path that is
held while aer_isr() runs?

When releasing the interrupt resources, the process eventually calls free_irq(), and then

__synchronize_irq () will be called to wait until all irq handlers have finished.

The aer_isr path will acquire device_lock in pci_bus_reset():

aer_isr # B execute irq process
aer_isr_one_error
aer_process_err_devices
handle_error_source
pcie_do_recovery
aer_root_reset
pci_bus_error_reset
pci_bus_reset # A acquire device_lock

The circular dependency causes system hang. Fix it by using
pci_bus_trylock() instead of pci_bus_lock() in pci_bus_reset(). When the
lock is unavailable, return -EAGAIN, as in similar cases.
pci_bus_error_reset() may use either pci_slot_reset() or
pci_bus_reset(), and this patch addresses only pci_bus_reset(). Is
the same deadlock possible in the pci_slot_reset() path?

Looking at the code flow, I agree that there is likely a potential issue here.

Unfortunately, my current test environment does not support slot_reset, so

I haven't been able to reproduce this specific scenario locally. It would be

incredibly helpful if someone with a compatible setup could help verify or reproduce this behavior.

Fixes: c4eed62a2143 ("PCI/ERR: Use slot reset if available")
Signed-off-by: Ziming Du <duziming2@xxxxxxxxxx>
---
drivers/pci/pci.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 13dbb405dc31..7471bfa6f32e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5515,15 +5515,22 @@ static int pci_bus_reset(struct pci_bus *bus, bool probe)
if (probe)
return 0;
- pci_bus_lock(bus);
+ /*
+ * Replace blocking lock with trylock to prevent deadlock during bus reset.
+ * Same as above except return -EAGAIN if the bus cannot be locked.
Wrap this to fit in 80 columns like the rest of the file.

+ */
+ if (pci_bus_trylock(bus)) {
- might_sleep();
+ might_sleep();
- ret = pci_bridge_secondary_bus_reset(bus->self);
+ ret = pci_bridge_secondary_bus_reset(bus->self);
- pci_bus_unlock(bus);
+ pci_bus_unlock(bus);
- return ret;
+ return ret;
+ }
+
+ return -EAGAIN;
}
/**
--
2.43.0