Re: MSI irqchip configured as IRQCHIP_ONESHOT_SAFE causes spurious IRQs

From: Thomas Gleixner
Date: Wed Jan 15 2020 - 20:39:53 EST


Ramon,

Ramon Fried <rfried.dev@xxxxxxxxx> writes:
> On Wed, Jan 15, 2020 at 12:54 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> Ramon Fried <rfried.dev@xxxxxxxxx> writes:
>> Due to the semantics of MSI this is perfectly fine and aside of your
>> problem this has worked perfectly fine so far and it's an actual
>> performance win because it avoid fiddling with the MSI mask which is
>> slow.
>>
> fiddling with MSI masks is a configuration space write, which is
> non-posted, so it does come with a price.
> The question is if a test was ever conducted to see the it's better
> than spurious IRQ's.

The point is that there are no spurious interrupts in the sane cases and
the tests we did showed a real performance improvements in high
frequency interrupt situations due to avoiding the config space access.

Please stop claiming that this spurious interrupt problem is there by
design. It's not. Read the MSI spec.

Also boot your laptop/workstation with 'threadirqs' on the kernel
command line and check how many spurious interrupts come in. On a test
machine which has that command line parameter set I see exactly ONE with
an uptime of several days and heavy MSI interrupt activity. The ONE is
even there without 'threadirqs' on the command line, so I really can't
be bothered to analyze that.

>> You still have not told which driver/hardware is affected by this. Can
>> you please provide that information so we can finally look at the actual
>> hardware/driver combo?
>>
> Sure,
> I'm writing an MSI IRQ controller, it's basically a MIPS GIC interrupt
> line which several MSI are multiplexed on it.

I assume you write the driver, not the VHDL for the actual hardware,
right? If so, you still did not tell which hardware that is and where we
can find information about it.

I further assume that 'multiplexed' means that the hardware is something
like an MSI receiver on the CPU/chipset which handles multiple MSI
messages and forwards them to a single shared interrupt line on the MIPS
GIC. Right?

Can you please provide a pointer to the hardware documentation?

> It's configured with handle_level_irq() as the GIC is level IRQ.

Which is completely bonkers. MSI has edge semantics and sharing an
interrupt line for edge type interrupts is broken by design, unless the
hardware which handles the incoming MSIs and forwards them to the level
type interrupt line is designed properly and the driver does the right
thing.

> The ack callback acks the GIC irq. the mask/unmask calls
> pci_msi_mask_irq() / pci_msi_unmask_irq()

What? How is that supposed to work with multiple MSIs?

Either the hardware is a trainwreck or the driver or both.

I can't tell as I can't find my crystal ball. Maybe I should replace it
with an Mobileye :)

Thanks,

tglx