Re: [PATCH v2 3/3] s390/pci: Fix circular/recursive deadlocks in PCI-bus and -device release

From: Benjamin Block

Date: Fri Mar 13 2026 - 10:08:15 EST


On Thu, Mar 12, 2026 at 10:02:59PM +0100, Niklas Schnelle wrote:
> On Wed, 2026-03-11 at 14:27 +0100, Benjamin Block wrote:
> > When removing PCI device or PCI bus objects there are a couple of
> > call-chains where it is possible that the kernel runs into a circular or
> > recursive deadlock involving taking the central `pci_rescan_remove_lock`.
>
> Now that this depends on Ionut's patch at[0] and exploits the fact that
> the rescan/remove lock can be taken recursively we may want to adjust
> and thereby shorten this description to take this into account dropping
> the recursive locking as deadlock reason.

Yeah, fair, with the dependency, case (B) and (C) don't exit anymore.

> >
> > For example:
> >
> > (A) Thread α receives a CRW event notifying the kernel that a PCI
>
> Nit: Other PCI related code calls these "PCI events" even though
> technically they use the Channel Report Word (CRW) mechanism and thus
> are CRW events.

Eh, sure, I can change that. I guess it heavily depends on where you are
coming from. For someone that mainly dealt with "classic CIO" devices talking
about CRW events that concern PCI devices immediately told me where I have to
look in the architecture and such to figure out what's going on. "PCI event"
not so much.

> > Fixes: 05bc1be6db4b2 ("s390/pci: create zPCI bus")
> > Fixes: ab909509850b2 ("PCI: s390: Fix use-after-free of PCI resources with per-function hotplug")
> > Signed-off-by: Benjamin Block <bblock@xxxxxxxxxxxxx>
> > ---
> > arch/s390/pci/pci.c | 11 ++++++++---
> > arch/s390/pci/pci_bus.c | 15 ++++++++-------
> > arch/s390/pci/pci_event.c | 28 +++++++++++++++++++---------
> > arch/s390/pci/pci_iov.c | 3 +--
> > arch/s390/pci/pci_sysfs.c | 9 +++------
> > 5 files changed, 39 insertions(+), 27 deletions(-)
> >
>
> I had not previously looked at your commit message from the point of
> view when it is ordered aferr Ionut's patch[0], which it has to for the
> release case. So I ended up with a few comments now even though I
> previously read it multiple times and was just very happy with the nice
> and readable call chain and locking order explanations.
>
> Either way the code itself looks good to me and I also did quite a bit
> of testing with this and the previous versions. Trying hot(un)plug and
> SR-IOV with various PCI devices and also just ran this and the previous
> version on several test systems for a few days doing other testing and
> development work. 
>
> I also put quite a bit of effort into trying to modify this and/or
> other s390 PCI code to not require the reentrant behavior of the
> pci_rescan_remove_lock but failed. The really tricky part to me is how
> e.g. a driver can hold on to a struct pci_dev even after it has been
> removed while we have to keep both the struct zpci_dev and struct
> zpci_bus and associated struct pci_bus exactly until the last
> pci_dev_put(). And then needing to remove the PCI bus via
> zpci_bus_release() we need to have the pci_rescan_remove lock but now
> this can happen, in principle, at any pci_dev_put() and that of course
> sometimes happens when we already have the pci_rescan_remove_lock and
> sometimes without it.
>
> So feel free to add my:
>
> Reviewed-by: Niklas Schnelle <schnelle@xxxxxxxxxxxxx>
> Tested-by: Niklas Schnelle <schnelle@xxxxxxxxxxxxx>

Thanks.

I'll hold of at re-rolling an other version since we don't have code changes
so far. Hoping for more input. Once I do, I'll adapt the description of this
patch.

--
Best Regards, Benjamin Block / Linux on IBM Z Kernel Development
IBM Deutschland Research & Development GmbH / https://www.ibm.com/privacy
Vors. Aufs.-R.: Wolfgang Wendt / Geschäftsführung: David Faller
Sitz der Ges.: Ehningen / Registergericht: AmtsG Stuttgart, HRB 243294