Re: [PATCH] irqchip: omap-intc: fix spurious irq handling

From: Tony Lindgren
Date: Thu Dec 03 2015 - 10:02:33 EST


* Sekhar Nori <nsekhar@xxxxxx> [151203 03:29]:
> On Tuesday 20 October 2015 08:22 PM, Tony Lindgren wrote:
> >
> > OK thanks for testing. My guess from the above list would be EDMA
> > or CPSW missing a flush of posted write. Maybe try adding a readback
> > of the related device revision register after acking the interrupt into
> > TPCC interrupt handler and CPSW interrupt handler(s)?
>
> I could get back to debugging this only now. I have converted
> __raw_writel to writel() and also added readback from the same register
> in both EDMA and CPSW drivers. But I am still able to reproduce the
> spurious irq reports.
>
> > The timer2 and uart0 seem to be false positives here naturally.
>
> I also added readback in 8250 driver. I haven't touched the timer
> driver, but I guess if that driver had an issue, it should have come out
> much earlier.
>
> I also saw that sometimes previous irq was the TI LCDC interrupt. Added
> readback there too. Did not help.

OK strange, so far all the ones we've seen have been fixable that way.

> > I would not yet rule out the "previous interrupt" theory until you have
> > tried that. We really want to know the root cause of the issue, just
> > printing out spurious interrupt does not fix the problem :)
>
> While we cannot rule out a software issue completely, the description in
> TRM around spurious interrupts suggests it can happen even with no role
> of software.

Yes maybe we more than one reason for them.

> May I suggest we go ahead and add this patch to the kernel after
> addressing Thomas's comment? At least it will prevent kernel from
> locking up with flood of prints when a spurious irq happens and allows
> easier debug by others too.

Yes we should naturally fix up the kernel locking.

Please also add something like "enable debug for more information"
to the warning. And then print out the current and previous interrupt
if DEBUG is enabled. And in the comments mention that often the spurious
interrupts has been fixed by adding a flush of the posted write to the
previous interrupt handler in the device driver.

Also, do you have a reproducable test case with mainline kernel I
could add to my collection of shell scripts?

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/