Re: [PATCH v3] PCI: pciehp: Fix hotplug on Catlow Lake with unreliable PME status

From: Lukas Wunner

Date: Wed Mar 25 2026 - 01:57:07 EST


On Tue, Mar 24, 2026 at 02:45:25PM -0700, Kuppuswamy Sathyanarayanan wrote:
> On 3/23/2026 4:24 PM, Bjorn Helgaas wrote:
> > eb34da60edee ("PCI: pciehp: Disable hotplug interrupt during suspend")
> > cleared PCI_EXP_SLTCTL_HPIE so that when the link goes down, we
> > wouldn't get a PCI_EXP_SLTSTA_DLLSC interrupt and wake the system.
> >
> > I don't know the details of why the PCI_EXP_SLTSTA_DLLSC would cause
> > that wakeup. I would think pciehp should field that, and it should be
> > able to figure out whether to bring the port out of D3hot.
> >
> > Anyway, with this patch it looks like we'll leave PCI_EXP_SLTCTL_HPIE
> > set, and potentially get that PCI_EXP_SLTSTA_DLLSC interrupt again?
>
> I have tested this patch on Catlow Lake. Enabling HPIE does not result in
> spurious wakeups as mentioned in Mika's patch.

I believe Mika saw the wakeup issue on a Thunderbolt controller when
going into system sleep. (In 2018, only discrete controllers existed.
SoC-integrated controllers were introduced a few years later.)

And Catlow Lake is a server PCH or at least derived from server silicon,
right? Did you test system sleep or just runtime suspend?

> > If we know we got a PME interrupt, and we can wake up (maybe more
> > slowly without a Requester ID), why can't we just do the wakeup
> > independent of PCI_EXP_RTSTA_PME and PCI_EXP_RTSTA_PME_RQ_ID? Are
> > spurious PME interrupts a problem?
>
> Yes, I think we can call pcie_pme_walk_bus() even when PCI_EXP_RTSTA_PME
> is clear for ports with the quirk. This would work but be slower without
> the Requester_ID hint.

The problem is, PME not only shares the interrupt with hotplug
(PCIe r7.0 sec 6.7.3.4), but if INTx is used it also shares the
interrupt with link bandwidth management, AER and DPC. So there's
lots of potential for spurious PME interrupts and I fear waking up
the entire hierarchy below the Root Port on every interrupt may
result in much worse power consumption.

At least Switch Upstream and Downstream Ports below the Root Port
need to be woken to access config space of Endpoints. With Thunderbolt,
these may be in D3cold and waking them up consumes a non-trivial amount
of time and energy.

As an aside, I note that the code in drivers/pci/pcie/pme.c doesn't
take into account that there may be Switch Upstream and Downstream
Ports between the Root Port and the wakeup-signaling device and
those switch ports may be in D3hot or D3cold. Which means config space
of the wakeup-signaling device is inaccessible. pci_check_pme_status()
happens to be written in such a way that if it reads fabricated
"all ones" responses from the device, it assumes that the device
is signaling wakeup. The final pci_write_config_word() in
pci_check_pme_status() will be lost but there's a call to
pci_enable_wake(pci_dev, PCI_D0, false) upon runtime resume
which makes up for the lost write, so the code happens to work.
Just be aware of pitfalls there...

Thanks,

Lukas