Re: [PATCH v2] PCI: pciehp: Fix hotplug on Catlow Lake with unreliable PME status
From: Rafael J. Wysocki
Date: Wed Feb 18 2026 - 12:33:40 EST
On Wed, Feb 18, 2026 at 5:27 PM Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> wrote:
>
>
>
> On 2/17/2026 10:08 AM, Rafael J. Wysocki wrote:
> > On Tue, Feb 17, 2026 at 5:54 PM Kuppuswamy Sathyanarayanan
> > <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> wrote:
> >>
> >> Hi Rafael,
> >>
> >> On 2/13/2026 3:14 PM, Kuppuswamy Sathyanarayanan wrote:
> >>> On Intel Catlow Lake platforms, PCH PCIe root ports do not reliably
> >>> update PME status registers (PME Status and PME Requester_ID in the
> >>> Root Status register) during D3hot to D0 transitions, even though PME
> >>> interrupts are delivered correctly.
> >>>
> >>> This issue manifests during PCIe hotplug operations as follows:
> >>>
> >>> 1. After a hot-remove event, the PCIe port transitions to D3hot and
> >>> the hotplug interrupt enable (HPIE) flag is disabled as the port
> >>> enters low power state.
> >>>
> >>> 2. When a hot-add occurs while the port is in D3hot, a PME interrupt
> >>> fires as expected to wake the port.
> >>>
> >>> 3. However, the PME interrupt handler finds the PME_Status and
> >>> PME_Requester_ID registers unpopulated, preventing identification
> >>> of which device triggered the PME. The handler returns IRQ_NONE,
> >>> leaving the port in D3hot.
> >
> > I think that you mean the
> >
> > if (PCI_POSSIBLE_ERROR(rtsta) || !(rtsta & PCI_EXP_RTSTA_PME))
> >
> > check in pcie_pme_irq(). Or do you mean something else?
>
> Yes, I was referring to the above check.
>
> >
> > An alternative workaround might be to add a (new) "always poll PME"
> > flag for the port in question that will cause it to go to pci_pme_list
> > in pci_pme_active() every time wakeup is enabled (essentially, an
> > override for pme_poll clearing).
>
> I will check whether this approach works. I want to make sure the poll
> logic eventually triggers the hotplug handler to detect slot state
> changes.
>
> But if you think there is no power-related issue with keeping these ports
> in D0, then we can adopt the pm_runtime_disable() approach. I think this
> approach looks clean and simple.
>
> What's your preference?
First, keeping the ports in D0 may gate runtime PC10. Does it not?
Second, I'd use pm_runtime_get_sync() in the quirk as I said because
pm_runtime_disable() generally breaks runtime PM dependency chains
between devices and may cause subtle side-effects to appear.
> >
> >>> 4. Because the port remains in D3hot with HPIE disabled, the hotplug
> >>> driver ignores the hot-add event, resulting in the newly inserted
> >>> device not being recognized.
> >>>
> >>> The PME interrupt delivery mechanism itself works correctly;
> >>> interrupts arrive reliably. The problem is purely the missing status
> >>> register updates. Verification via IOSF-SideBand (IOSF-SB) backdoor
> >>> reads confirms that these registers remain empty when the PME
> >>> interrupt fires. Neither BIOS nor kernel code is clearing these
> >>> registers.
> >>>
> >>> This issue is present in all steppings of Catlow Lake PCH and affects
> >>> customers in production deployments. A public hardware errata document
> >>> is not yet available.
> >>>
> >>> Work around this issue by disabling runtime PM for affected ports,
> >>> keeping them in D0 during runtime operation. This ensures hotplug
> >>> events are handled via direct interrupts rather than relying on
> >>> unreliable PME-based wakeup.
> >>>
> >>> During system suspend/resume, PCIe ports are resumed unconditionally
> >>> when coming out of system sleep due to DPM_FLAG_SMART_SUSPEND set by
> >>> pcie_portdrv_probe(), and pciehp re-enables interrupts and checks slot
> >>> occupation status during resume.
> >>>
> >>> The quirk is applied only to Catlow PCH PCIe root ports (device IDs
> >>> 0x7a30 through 0x7a4b). Catlow CPU PCIe ports are not affected as
> >>> they are not hotplug-capable.
> >>>
> >>> Suggested-by: Lukas Wunner <lukas@xxxxxxxxx>
> >>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
> >>> ---
> >>
> >> Could you please review this patch and let us know if calling
> >> pm_runtime_disable() from a PCI quirk is acceptable?
> >>
> >> The quirk keeps specific Catlow Lake PCH PCIe root ports in D0 to
> >> work around a hardware bug where PME status registers are not reliably
> >> updated during D3hot to D0 transitions, causing hotplug events to be
> >> missed.
> >>
> >> System suspend/resume is unaffected as DPM_FLAG_SMART_SUSPEND ensures
> >> ports are resumed unconditionally and pciehp checks slot occupation
> >> on resume.
> >>
> >>
> >>>
> >>> Changes since v1:
> >>> * Removed hack in hotplug driver and disabled runtime PM on affected ports.
> >>> * Fixed the commit log and comments accordingly.
> >>>
> >>> drivers/pci/quirks.c | 49 ++++++++++++++++++++++++++++++++++++++++++++
> >>> 1 file changed, 49 insertions(+)
> >>>
> >>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> >>> index 280cd50d693b..779cd65b1a8a 100644
> >>> --- a/drivers/pci/quirks.c
> >>> +++ b/drivers/pci/quirks.c
> >>> @@ -6340,3 +6340,52 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
> >>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
> >>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
> >>> #endif
> >>> +
> >>> +/*
> >>> + * Intel Catlow Lake PCH PCIe root ports have a hardware issue where
> >>> + * PME status registers (PME Status and PME Requester_ID in Root Status)
> >>> + * are not reliably updated during D3hot to D0 transitions, even though
> >>> + * PME interrupts are delivered correctly.
> >>> + *
> >>> + * When a hotplug event occurs while the port is in D3hot, the PME
> >>> + * interrupt fires but the status registers remain empty. This prevents
> >>> + * the PME handler from identifying the event source, leaving the port
> >>> + * in D3hot and causing the hotplug driver to miss the event.
> >>> + *
> >>> + * Disable runtime PM to keep these ports in D0, ensuring hotplug events
> >>> + * are handled via direct interrupts.
> >>> + */
> >>> +static void quirk_intel_catlow_pcie_no_pme_wakeup(struct pci_dev *dev)
> >>> +{
> >>> + pm_runtime_disable(&dev->dev);
> >
> > Personally, I would use pm_runtime_get_sync() here instead which would
> > really mean "never suspend".
> >
> >>> + pci_info(dev, "Catlow PCH port: PME status unreliable, disabling runtime PM\n");
> >>> +}
> >>> +/* Apply quirk to Catlow Lake PCH root ports (0x7a30 - 0x7a4b) */
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a30, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a31, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a32, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a33, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a34, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a35, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a36, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a37, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a38, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a39, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3a, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3b, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3c, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3d, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3e, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3f, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a40, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a41, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a42, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a43, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a44, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a45, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a46, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a47, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a48, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a49, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a4a, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a4b, quirk_intel_catlow_pcie_no_pme_wakeup);
> >>
> >> --
> >> Sathyanarayanan Kuppuswamy
> >> Linux Kernel Developer
> >>
> >
>
> --
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer
>
>