Re: [PATCH v2] pci: fix unavailable irq number 255 reported by BIOS

From: Chen Fan
Date: Tue Jan 26 2016 - 04:50:41 EST



On 01/26/2016 04:26 PM, Thomas Gleixner wrote:
On Mon, 25 Jan 2016, Bjorn Helgaas wrote:
On Mon, Jan 25, 2016 at 02:59:38PM +0800, Chen Fan wrote:
i801_smbus 0000:00:1f.3: PCI INT C: no GSI
i801_smbus 0000:00:1f.3: Failed to allocate irq 255: -16
i801_smbus: probe of 0000:00:1f.3 failed with error -16
The current code does not not fail when the interrupt request fails. It
reports it and clears the IRQ feature flag.

@@ -436,7 +437,15 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
* driver reported one, then use it. Exit in any case.
*/
if (gsi < 0) {
- if (acpi_isa_register_gsi(dev))
+#ifdef CONFIG_X86
+ /*
+ * The Interrupt Line value of 0xff is defined to mean "unknown"
+ * or "no connection" (PCI 3.0, Section 6.2.4, footnote on page
+ * 223), using ~0U as invalid IRQ.
+ */
And why would this be x86 specific? PCI3.0 is architecture independent, right?
quoting the spec document:
"For x86 based PCs, the values in this register correspond to IRQ numbers (0-15) of the standard dual
8259 configuration. The value 255 is defined as meaning "unknown" or "no connection" to the interrupt
controller. Values between 15 and 254 are reserved."


+ dev->irq = (dev->irq == 0xff) ? IRQ_INVALID : dev->irq;
It's much simpler and clearer to write:

if (dev->irq == 0xff)
dev->irq = IRQ_INVALID;
I do not understand that IRQ_INVALID business at all.

+#endif
+ if (!irq_is_valid(dev->irq) || acpi_isa_register_gsi(dev))
dev_warn(&dev->dev, "PCI INT %c: no GSI\n",
pin_name(pin));
The existing code already drops into this place because
acpi_isa_register_gsi() fails.

i801_smbus 0000:00:1f.3: PCI INT C: no GSI
What extra value does that !irq_is_valid() provide?

And how does setting dev->irq to ~0U prevent that request_irq() is called in
the i801 device driver? Not at all, AFAICT. It will just fail with a different
error.

So the whole 'fix' relies on the fact that irq ~0U does not exist (at least
not today) and therefor the false sharing with some other driver using irq 255
will not happen.

Relying on undocumented behaviour is not a fix, that's voodoo programming.

The proper solution here is to flag that this device does not have an
interrupt connected and act accordingly in the device driver, i.e. do not call
request_irq() in the first place.
yes, this is what I thought in previous email, I has asked that
whether we can use a broken_irq flag in pci_dev to mark the device irq if invalid.
and then if the device broken_irq set, we could directly skip call the request_irq.
maybe we can set the broken_irq in pci_read_irq if the irq is 0xff.

Thanks,
Chen


+static inline bool irq_is_valid(unsigned int irq)
+{
+#ifdef CONFIG_X86
+ if (irq == IRQ_INVALID)
+ return false;
+#endif
+ return true;
+}
I don't like the x86 ifdef. I'd prefer:

static inline bool irq_valid(unsigned int irq)
{
if (irq < NR_IRQS)
return true;
return false;
}

This could be used in many of the places that currently use NR_IRQS.
No. NR_IRQS cannot be used at all if sparse irqs are enabled. Nothing in any
generic code is supposed to rely on NR_IRQS.

Thanks,

tglx



.