Re: [PATCH v2 1/1] powerpc/eeh: fix recursive pci_lock_rescan_remove locking in EEH event handling
From: Narayana Murty N
Date: Wed Dec 17 2025 - 00:02:43 EST
On 11/12/25 9:15 PM, Timothy Pearson wrote:
It has not been specifically tested for NVMe hotplug on PowerNV hardware.
----- Original Message -----
From: "Narayana Murty N" <nnmlinux@xxxxxxxxxxxxx>Conceptually the patch sounds OK, but given the complexity of these subsystems it's difficult to forsee all interactions. Was the patch verified not to break NVMe hotplug on PowerNV systems using actual hardware? If not, I will need to do so before sending an ack. Thanks!
To: "mahesh" <mahesh@xxxxxxxxxxxxx>, "Oliver" <oohall@xxxxxxxxx>, "Madhavan Srinivasan" <maddy@xxxxxxxxxxxxx>, "Michael
Ellerman" <mpe@xxxxxxxxxxxxxx>, "npiggin" <npiggin@xxxxxxxxx>, "christophe leroy" <christophe.leroy@xxxxxxxxxx>
Cc: "Bjorn Helgaas" <bhelgaas@xxxxxxxxxx>, "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx>, "linuxppc-dev"
<linuxppc-dev@xxxxxxxxxxxxxxxx>, "linux-kernel" <linux-kernel@xxxxxxxxxxxxxxx>, "vaibhav" <vaibhav@xxxxxxxxxxxxx>,
"Shivaprasad G Bhat" <sbhat@xxxxxxxxxxxxx>, ganeshgr@xxxxxxxxxxxxx
Sent: Wednesday, December 10, 2025 8:25:59 AM
Subject: [PATCH v2 1/1] powerpc/eeh: fix recursive pci_lock_rescan_remove locking in EEH event handling
The recent commit 1010b4c012b0 ("powerpc/eeh: Make EEH driver device
hotplug safe") restructured the EEH driver to improve synchronization
with the PCI hotplug layer.
However, it inadvertently moved pci_lock_rescan_remove() outside its
intended scope in eeh_handle_normal_event(), leading to broken PCI
error reporting and improper EEH event triggering. Specifically,
eeh_handle_normal_event() acquired pci_lock_rescan_remove() before
calling eeh_pe_bus_get(), but eeh_pe_bus_get() itself attempts to
acquire the same lock internally, causing nested locking and disrupting
normal EEH event handling paths.
This patch adds a boolean parameter do_lock to _eeh_pe_bus_get(),
with two public wrappers:
eeh_pe_bus_get() with locking enabled.
eeh_pe_bus_get_nolock() that skips locking.
Callers that already hold pci_lock_rescan_remove() now use
eeh_pe_bus_get_nolock() to avoid recursive lock acquisition.
Additionally, pci_lock_rescan_remove() calls are restored to the correct
position—after eeh_pe_bus_get() and immediately before iterating affected
PEs and devices. This ensures EEH-triggered PCI removes occur under proper
bus rescan locking without recursive lock contention.
The eeh_pe_loc_get() function has been split into two functions:
eeh_pe_loc_get(struct eeh_pe *pe) which retrieves the loc for given PE.
eeh_pe_loc_get_bus(struct pci_bus *bus) which retrieves the location
code for given bus.
However, this change does not remove or relax any of the existing locking
around EEH handling, so the NVMe hotplug paths should continue to see
the same serialization as before.
If you have a convenient setup for NVMe hotplug on PowerNV, additional testing
there would definitely be helpful before merging.
Thanks,
Narayana Murty