Re: [PATCH v2] PCI: pciehp: Fix hotplug on Catlow Lake with unreliable PME status
From: Kuppuswamy Sathyanarayanan
Date: Wed Feb 18 2026 - 11:27:58 EST
On 2/17/2026 10:08 AM, Rafael J. Wysocki wrote:
> On Tue, Feb 17, 2026 at 5:54 PM Kuppuswamy Sathyanarayanan
> <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> wrote:
>>
>> Hi Rafael,
>>
>> On 2/13/2026 3:14 PM, Kuppuswamy Sathyanarayanan wrote:
>>> On Intel Catlow Lake platforms, PCH PCIe root ports do not reliably
>>> update PME status registers (PME Status and PME Requester_ID in the
>>> Root Status register) during D3hot to D0 transitions, even though PME
>>> interrupts are delivered correctly.
>>>
>>> This issue manifests during PCIe hotplug operations as follows:
>>>
>>> 1. After a hot-remove event, the PCIe port transitions to D3hot and
>>> the hotplug interrupt enable (HPIE) flag is disabled as the port
>>> enters low power state.
>>>
>>> 2. When a hot-add occurs while the port is in D3hot, a PME interrupt
>>> fires as expected to wake the port.
>>>
>>> 3. However, the PME interrupt handler finds the PME_Status and
>>> PME_Requester_ID registers unpopulated, preventing identification
>>> of which device triggered the PME. The handler returns IRQ_NONE,
>>> leaving the port in D3hot.
>
> I think that you mean the
>
> if (PCI_POSSIBLE_ERROR(rtsta) || !(rtsta & PCI_EXP_RTSTA_PME))
>
> check in pcie_pme_irq(). Or do you mean something else?
Yes, I was referring to the above check.
>
> An alternative workaround might be to add a (new) "always poll PME"
> flag for the port in question that will cause it to go to pci_pme_list
> in pci_pme_active() every time wakeup is enabled (essentially, an
> override for pme_poll clearing).
I will check whether this approach works. I want to make sure the poll
logic eventually triggers the hotplug handler to detect slot state
changes.
But if you think there is no power-related issue with keeping these ports
in D0, then we can adopt the pm_runtime_disable() approach. I think this
approach looks clean and simple.
What's your preference?
>
>>> 4. Because the port remains in D3hot with HPIE disabled, the hotplug
>>> driver ignores the hot-add event, resulting in the newly inserted
>>> device not being recognized.
>>>
>>> The PME interrupt delivery mechanism itself works correctly;
>>> interrupts arrive reliably. The problem is purely the missing status
>>> register updates. Verification via IOSF-SideBand (IOSF-SB) backdoor
>>> reads confirms that these registers remain empty when the PME
>>> interrupt fires. Neither BIOS nor kernel code is clearing these
>>> registers.
>>>
>>> This issue is present in all steppings of Catlow Lake PCH and affects
>>> customers in production deployments. A public hardware errata document
>>> is not yet available.
>>>
>>> Work around this issue by disabling runtime PM for affected ports,
>>> keeping them in D0 during runtime operation. This ensures hotplug
>>> events are handled via direct interrupts rather than relying on
>>> unreliable PME-based wakeup.
>>>
>>> During system suspend/resume, PCIe ports are resumed unconditionally
>>> when coming out of system sleep due to DPM_FLAG_SMART_SUSPEND set by
>>> pcie_portdrv_probe(), and pciehp re-enables interrupts and checks slot
>>> occupation status during resume.
>>>
>>> The quirk is applied only to Catlow PCH PCIe root ports (device IDs
>>> 0x7a30 through 0x7a4b). Catlow CPU PCIe ports are not affected as
>>> they are not hotplug-capable.
>>>
>>> Suggested-by: Lukas Wunner <lukas@xxxxxxxxx>
>>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
>>> ---
>>
>> Could you please review this patch and let us know if calling
>> pm_runtime_disable() from a PCI quirk is acceptable?
>>
>> The quirk keeps specific Catlow Lake PCH PCIe root ports in D0 to
>> work around a hardware bug where PME status registers are not reliably
>> updated during D3hot to D0 transitions, causing hotplug events to be
>> missed.
>>
>> System suspend/resume is unaffected as DPM_FLAG_SMART_SUSPEND ensures
>> ports are resumed unconditionally and pciehp checks slot occupation
>> on resume.
>>
>>
>>>
>>> Changes since v1:
>>> * Removed hack in hotplug driver and disabled runtime PM on affected ports.
>>> * Fixed the commit log and comments accordingly.
>>>
>>> drivers/pci/quirks.c | 49 ++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 49 insertions(+)
>>>
>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>> index 280cd50d693b..779cd65b1a8a 100644
>>> --- a/drivers/pci/quirks.c
>>> +++ b/drivers/pci/quirks.c
>>> @@ -6340,3 +6340,52 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
>>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
>>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
>>> #endif
>>> +
>>> +/*
>>> + * Intel Catlow Lake PCH PCIe root ports have a hardware issue where
>>> + * PME status registers (PME Status and PME Requester_ID in Root Status)
>>> + * are not reliably updated during D3hot to D0 transitions, even though
>>> + * PME interrupts are delivered correctly.
>>> + *
>>> + * When a hotplug event occurs while the port is in D3hot, the PME
>>> + * interrupt fires but the status registers remain empty. This prevents
>>> + * the PME handler from identifying the event source, leaving the port
>>> + * in D3hot and causing the hotplug driver to miss the event.
>>> + *
>>> + * Disable runtime PM to keep these ports in D0, ensuring hotplug events
>>> + * are handled via direct interrupts.
>>> + */
>>> +static void quirk_intel_catlow_pcie_no_pme_wakeup(struct pci_dev *dev)
>>> +{
>>> + pm_runtime_disable(&dev->dev);
>
> Personally, I would use pm_runtime_get_sync() here instead which would
> really mean "never suspend".
>
>>> + pci_info(dev, "Catlow PCH port: PME status unreliable, disabling runtime PM\n");
>>> +}
>>> +/* Apply quirk to Catlow Lake PCH root ports (0x7a30 - 0x7a4b) */
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a30, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a31, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a32, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a33, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a34, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a35, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a36, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a37, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a38, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a39, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3a, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3b, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3c, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3d, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3e, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3f, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a40, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a41, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a42, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a43, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a44, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a45, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a46, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a47, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a48, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a49, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a4a, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a4b, quirk_intel_catlow_pcie_no_pme_wakeup);
>>
>> --
>> Sathyanarayanan Kuppuswamy
>> Linux Kernel Developer
>>
>
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer