Re: [PATCH] irqchip: omap-intc: fix spurious irq handling
From: Tony Lindgren
Date: Tue Oct 20 2015 - 10:53:07 EST
* John Ogness <john.ogness@xxxxxxxxxxxxx> [151020 00:33]:
> On 2015-10-20, Sekhar Nori <nsekhar@xxxxxx> wrote:
> >> Do you know what really is causing the spurious interrupts in your
> >> case?
> >
> > No, not yet.
>
> According to the TRM this is normal behavior if conditions that might
> affect priority are changed during priority sorting.
>
> 6.2.5 ARM A8 INTC Spurious Interrupt Handling
>
> The spurious flag indicates whether the result of the sorting (a
> window of 10 INTC functional clock cycles after the interrupt
> assertion) is invalid. The sorting is invalid if:
>
> - The interrupt that triggered the sorting is no longer active
> during the sorting.
>
> - A change in the mask has affected the result during the sorting
> time.
>
> >> In all the cases I've seen, the spurious interrupts were caused by a
> >> missing flush of posted write acking the IRQ at the device driver.
> >> for the _previously triggered_ INTC interrupt.
> >>
> >> If you have a reproducable case, I suggest you test that by printing
> >> out the previous interrupt to check if that makes sense. And then see
> >> if adding the missing read back to that interrupt handler fixes the
> >> issue.
> >
> > Okay, thats good to know. Thanks for the hints and history of your debug
> > on OMAP3. The issue is not easily reproducible in my case. But if I try
> > hard enough, I can get hit it though. So I can surely try your hints.
>
> I can reproduce the situation very easily. After running a test for a
> few minutes and printing out the previous interrupt, I have the
> following list. These are the irq numbers seen by the handler before the
> spurious interrupt triggered.
>
> INT12 - EDMACOMPINT - TPCC (EDMA)
> INT41 - 3PGSWRXINT0 - CPSW (Ethernet)
> INT42 - 3PGSWTXINT0 - CPSW (Ethernet)
> INT68 - TINT2 - DMTIMER2
> INT72 - UART0INT - UART0
>
> From this I do not think we can put the blame on any single driver. I
> trigger this situation very easily by putting a load of 7,000+
> interrupts per second on the system. This means we have 70,000 INTC
> clock cycles per second where a change in the interrupt priority
> conditions would cause the priority sorting to become invalid and thus
> cause the spurious interrupt.
>
> I'm not sure if we can/should do anything more than Sekhar's patch of
> acknowledging the spurious interrupt so the priority sorting algorithm
> can run again.
OK thanks for testing. My guess from the above list would be EDMA
or CPSW missing a flush of posted write. Maybe try adding a readback
of the related device revision register after acking the interrupt into
TPCC interrupt handler and CPSW interrupt handler(s)?
The timer2 and uart0 seem to be false positives here naturally.
I would not yet rule out the "previous interrupt" theory until you have
tried that. We really want to know the root cause of the issue, just
printing out spurious interrupt does not fix the problem :)
Regards,
Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/