Re: [PATCH] PCI: Fix AB-BA deadlock between aer_isr() and device_shutdown()

From: Lukas Wunner

Date: Tue Feb 24 2026 - 01:40:55 EST


On Fri, Jan 09, 2026 at 05:56:03PM +0800, Ziming Du wrote:
> During system shutdown, a deadlock may occur between AER recovery process
> and device shutdown as follows:

The subject is slightly misleading as this isn't an AB-BA deadlock,
which involves two locks. It's a deadlock involving a single lock
(device_lock), where one task (shutdown) acquires the lock, then
waits for the AER interrupt thread to finish, but that thread is
waiting on the lock.

device_shutdown() acquires the device_lock to avoid invoking a driver's
->shutdown() callback while its ->probe() callback is still running or
while the driver is being removed, cf. d1c6c030fcec. That seems
reasonable.

It's unclear why pci_bus_reset() needs to acquire device_lock. This was
introduced by 090a3c5322e9. I'm adding Alex (the author) to cc.

Another question to ask is whether it makes sense at all to attempt
error recovery when the system is shutting down. Maybe we should log
the errors, but no longer try to recover from them?

It's possible to determine whether shutdown is in progress by querying
system_state (set by kernel/reboot.c). However we can't just skip
calling pci_bus_error_reset() in aer_root_reset() if system_state
indicates shutdown because it would still be racy. The only race-free
solution would be to register a notifier with reboot_notifier_list
which sets a flag that shutdown is in progress and waits for the
interrupt thread to finish. It's quite a complicated solution just
to work around a deadlock, so I suggest to first look into removal of
device_lock acquisition in pci_bus_reset().

Simply using trylock doesn't seem bullet-proof.

Thanks,

Lukas