Re: [PATCH v2] PCI: pciehp: Fix hotplug on Catlow Lake with unreliable PME status
From: Rafael J. Wysocki
Date: Tue Feb 17 2026 - 13:08:44 EST
On Tue, Feb 17, 2026 at 5:54 PM Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> wrote:
>
> Hi Rafael,
>
> On 2/13/2026 3:14 PM, Kuppuswamy Sathyanarayanan wrote:
> > On Intel Catlow Lake platforms, PCH PCIe root ports do not reliably
> > update PME status registers (PME Status and PME Requester_ID in the
> > Root Status register) during D3hot to D0 transitions, even though PME
> > interrupts are delivered correctly.
> >
> > This issue manifests during PCIe hotplug operations as follows:
> >
> > 1. After a hot-remove event, the PCIe port transitions to D3hot and
> > the hotplug interrupt enable (HPIE) flag is disabled as the port
> > enters low power state.
> >
> > 2. When a hot-add occurs while the port is in D3hot, a PME interrupt
> > fires as expected to wake the port.
> >
> > 3. However, the PME interrupt handler finds the PME_Status and
> > PME_Requester_ID registers unpopulated, preventing identification
> > of which device triggered the PME. The handler returns IRQ_NONE,
> > leaving the port in D3hot.
I think that you mean the
if (PCI_POSSIBLE_ERROR(rtsta) || !(rtsta & PCI_EXP_RTSTA_PME))
check in pcie_pme_irq(). Or do you mean something else?
An alternative workaround might be to add a (new) "always poll PME"
flag for the port in question that will cause it to go to pci_pme_list
in pci_pme_active() every time wakeup is enabled (essentially, an
override for pme_poll clearing).
> > 4. Because the port remains in D3hot with HPIE disabled, the hotplug
> > driver ignores the hot-add event, resulting in the newly inserted
> > device not being recognized.
> >
> > The PME interrupt delivery mechanism itself works correctly;
> > interrupts arrive reliably. The problem is purely the missing status
> > register updates. Verification via IOSF-SideBand (IOSF-SB) backdoor
> > reads confirms that these registers remain empty when the PME
> > interrupt fires. Neither BIOS nor kernel code is clearing these
> > registers.
> >
> > This issue is present in all steppings of Catlow Lake PCH and affects
> > customers in production deployments. A public hardware errata document
> > is not yet available.
> >
> > Work around this issue by disabling runtime PM for affected ports,
> > keeping them in D0 during runtime operation. This ensures hotplug
> > events are handled via direct interrupts rather than relying on
> > unreliable PME-based wakeup.
> >
> > During system suspend/resume, PCIe ports are resumed unconditionally
> > when coming out of system sleep due to DPM_FLAG_SMART_SUSPEND set by
> > pcie_portdrv_probe(), and pciehp re-enables interrupts and checks slot
> > occupation status during resume.
> >
> > The quirk is applied only to Catlow PCH PCIe root ports (device IDs
> > 0x7a30 through 0x7a4b). Catlow CPU PCIe ports are not affected as
> > they are not hotplug-capable.
> >
> > Suggested-by: Lukas Wunner <lukas@xxxxxxxxx>
> > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
> > ---
>
> Could you please review this patch and let us know if calling
> pm_runtime_disable() from a PCI quirk is acceptable?
>
> The quirk keeps specific Catlow Lake PCH PCIe root ports in D0 to
> work around a hardware bug where PME status registers are not reliably
> updated during D3hot to D0 transitions, causing hotplug events to be
> missed.
>
> System suspend/resume is unaffected as DPM_FLAG_SMART_SUSPEND ensures
> ports are resumed unconditionally and pciehp checks slot occupation
> on resume.
>
>
> >
> > Changes since v1:
> > * Removed hack in hotplug driver and disabled runtime PM on affected ports.
> > * Fixed the commit log and comments accordingly.
> >
> > drivers/pci/quirks.c | 49 ++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 49 insertions(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 280cd50d693b..779cd65b1a8a 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -6340,3 +6340,52 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
> > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
> > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
> > #endif
> > +
> > +/*
> > + * Intel Catlow Lake PCH PCIe root ports have a hardware issue where
> > + * PME status registers (PME Status and PME Requester_ID in Root Status)
> > + * are not reliably updated during D3hot to D0 transitions, even though
> > + * PME interrupts are delivered correctly.
> > + *
> > + * When a hotplug event occurs while the port is in D3hot, the PME
> > + * interrupt fires but the status registers remain empty. This prevents
> > + * the PME handler from identifying the event source, leaving the port
> > + * in D3hot and causing the hotplug driver to miss the event.
> > + *
> > + * Disable runtime PM to keep these ports in D0, ensuring hotplug events
> > + * are handled via direct interrupts.
> > + */
> > +static void quirk_intel_catlow_pcie_no_pme_wakeup(struct pci_dev *dev)
> > +{
> > + pm_runtime_disable(&dev->dev);
Personally, I would use pm_runtime_get_sync() here instead which would
really mean "never suspend".
> > + pci_info(dev, "Catlow PCH port: PME status unreliable, disabling runtime PM\n");
> > +}
> > +/* Apply quirk to Catlow Lake PCH root ports (0x7a30 - 0x7a4b) */
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a30, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a31, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a32, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a33, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a34, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a35, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a36, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a37, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a38, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a39, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3a, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3b, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3c, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3d, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3e, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3f, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a40, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a41, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a42, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a43, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a44, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a45, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a46, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a47, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a48, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a49, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a4a, quirk_intel_catlow_pcie_no_pme_wakeup);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a4b, quirk_intel_catlow_pcie_no_pme_wakeup);
>
> --
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer
>