Re: 4.4.1 regression from 4.1.x: Soekris net5501 crash in IRQ after mfgpt timer initialization

From: Thomas Gleixner
Date: Tue Feb 02 2016 - 09:48:38 EST


On Tue, 2 Feb 2016, Nix wrote:

> [Cc:ed Thomas on the vague hope that maybe this is osmething to do with
> the IRQ subsystem in general, though I doubt it, since only the one
> machine is crashing for me: it's probably the CS5531's interactions
> with said subsystem at fault.]

Kinda. That driver does the following:

setup the irq in CS5531

request the interrupt to install the handler

register the clockevents device

> [ 1.589543] cs5535-clockevt: Registering MFGPT timer as a clock event, using IRQ 7
> [ 1.604921] BUG: unable to handle kernel NULL pointer dereference at (null)

> [ 1.605101] [<c02cf141>] ? mfgpt_tick+0x6e/0x77
...
> [ 1.605339] [<c039c1a9>] common_interrupt+0x29/0x30
...
> [ 1.605394] [<c013e4e7>] vprintk_emit+0x2b4/0x2be
> [ 1.605426] [<c013e5f2>] vprintk_default+0x12/0x14
> [ 1.605446] [<c0164cd6>] printk+0x11/0x13
> [ 1.605462] [<c04b625a>] cs5535_mfgpt_init+0xce/0xf1

So the interrupt hits before the clockevent device is registered and the event
handler is installed. So mfgpt_tick() will happily call a null pointer.

The patch below should fix^Wwork around the issue.

Thanks,

tglx

8<----------------

--- a/drivers/clocksource/cs5535-clockevt.c
+++ b/drivers/clocksource/cs5535-clockevt.c
@@ -117,7 +117,8 @@ static irqreturn_t mfgpt_tick(int irq, void *dev_id)
/* Turn off the clock (and clear the event) */
disable_timer(cs5535_event_clock);

- if (clockevent_state_shutdown(&cs5535_clockevent))
+ if (clockevent_state_shutdown(&cs5535_clockevent) ||
+ clockevent_state_detached(&cs5535_clockevent))
return IRQ_HANDLED;

/* Clear the counter */