Re: [PATCH 2/2] PCI: Disable PCIE hotplug interrupts early when msi is disabled

From: Feng Tang
Date: Tue Feb 04 2025 - 22:58:25 EST


Hi Lukas,

On Tue, Feb 04, 2025 at 10:23:45AM +0100, Lukas Wunner wrote:
> On Tue, Feb 04, 2025 at 01:37:58PM +0800, Feng Tang wrote:
> > There was a irq storm bug when testing "pci=nomsi" case, and the root
> > cause is: 'nomsi' will disable MSI and let devices and root ports use
> > legacy INTX inerrupt, and likely make several devices/ports share one
> > interrupt. In the failure case, BIOS doesn't disable the PCIE hotplug
> > interrupts, and actually asserts the command-complete interrupt.
> > As MSI is disabled, ACPI initialization code will not enumerate root
> > port's PCIE hotplug capability, and pciehp service driver wont' be
> > enabled for the root port to handle that interrupt, later on when it is
> > shared and enabled by other device driver like NVME or NIC, the "nobody
> > care irq storm" happens.
>
> Is there a section in the PCI Firmware Spec which says ACPI doesn't
> enumerate the hotplug capability if MSI is disabled?

No, I didn't get it from spec, but found the logic by code reading
during debugging the irq storm issue. The related code is about:


#define ACPI_PCIE_REQ_SUPPORT (OSC_PCI_EXT_CONFIG_SUPPORT \
| OSC_PCI_ASPM_SUPPORT \
| OSC_PCI_CLOCK_PM_SUPPORT \
| OSC_PCI_MSI_SUPPORT)

acpi_pci_root_add
negotiate_os_control
calculate_support
if (pci_msi_enabled())
support |= OSC_PCI_MSI_SUPPORT;
decode_osc_support
os_control_query_checks
if ((support & ACPI_PCIE_REQ_SUPPORT) != ACPI_PCIE_REQ_SUPPORT)
return false
acpi_pci_osc_control_set

And later in get_port_device_capability(), the pciehp service bit
won't be set, and driver is not loaded.

Thanks,
Feng

> If so, it should be referenced in the commit message.
>
> If not, I'm wondering if it's safe to fiddle with the Slot Control
> register given the platform hasn't granted OSPM control of it.
>
> Of course if this is spec-defined behavior in the nomsi case,
> we could make the write to the Slot Control register conditional
> on that. But if this turns out to be platform-specific behavior,
> we can't deal with it in generic PCI code but only in a quirk.
>
> Thanks,
>
> Lukas