Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing

From: Lukas Wunner
Date: Wed Jan 15 2025 - 05:26:50 EST


On Tue, Jan 14, 2025 at 08:25:04PM +0200, Ilpo Järvinen wrote:
> On Tue, 14 Jan 2025, Jiwei wrote:
> > [ 539.362400] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841
> > [ 539.395720] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041
>
> DLLLA=0
>
> But LBMS did not get reset.
>
> So is this perhaps because hotplug cannot keep up with the rapid
> remove/add going on, and thus will not always call the remove_board()
> even if the device went away?
>
> Lukas, do you know if there's a good way to resolve this within hotplug
> side?

I believe the pciehp code is fine and suspect this is an issue
in the quirk. We've been dealing with rapid add/remove in pciehp
for years without issues.

I don't understand the quirk sufficiently to make a guess
what's going wrong, but I'm wondering if there could be
a race accessing the lbms_count?

Maybe if lbms_count is replaced by a flag in pci_dev->priv_flags
as we've discussed, with proper memory barriers where necessary,
this problem will solve itself?

Thanks,

Lukas