Re: PROBLEM: Kernel BUG in mfgpt_tick (cs5535-clockevt.c) on ALIX 2c3 - null call

From: David Kozub
Date: Mon Oct 09 2017 - 15:33:57 EST


On Mon, 9 Oct 2017, Daniel Lezcano wrote:

On 07/10/2017 23:26, David Kozub wrote:
Hi all,

booting up kernel 4.14-rc3 with CS5535_CLOCK_EVENT_SRC on an ALIX 2c3
(http://pcengines.ch/alix2c3.htm) dies with:

[ÂÂÂ 2.313086] cs5535-smb cs5535-smb: SCx200 device 'CS5535 ACB0'
registered
[ÂÂÂ 2.338711] cs5535-mfgpt cs5535-mfgpt: registered timer 0
[ÂÂÂ 2.355676] ledtrig-cpu: registered to indicate activity on CPUs
[ÂÂÂ 2.373745] cs5535-mfgpt cs5535-mfgpt: registered timer 1
[ÂÂÂ 2.389976] cs5535-clockevt: Registering MFGPT timer as a clock
event, using IRQ 7
[ÂÂÂ 2.412687] BUG: unable to handle kernel NULL pointer dereference
atÂÂ (null)
[ÂÂÂ 2.412698] IP:ÂÂ (null)
[ÂÂÂ 2.412702] *pde = 00000000
[ÂÂÂ 2.412713] Oops: 0000 [#1]
[ÂÂÂ 2.412716] Modules linked in:
[ÂÂÂ 2.412732] CPU: 0 PID: 1 Comm: swapper Not tainted
4.14.0-rc3-humel-test17 #36
[ÂÂÂ 2.412739] task: c0010000 task.stack: c008e000
[ÂÂÂ 2.412744] EIP:ÂÂ (null)
[ÂÂÂ 2.412749] EFLAGS: 00210093 CPU: 0
[ÂÂÂ 2.412758] EAX: c05fb8c0 EBX: c05fb880 ECX: 00000000 EDX: 0000620c
[ÂÂÂ 2.412769] ESI: c0009fd4 EDI: c014e14e EBP: c0009fac ESP: c0009fa8
[ÂÂÂ 2.412780]Â DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ÂÂÂ 2.412790] CR0: 80050033 CR2: 00000000 CR3: 0064f000 CR4: 00000090
[ÂÂÂ 2.412793] Call Trace:
[ÂÂÂ 2.412800]Â <IRQ>
[ÂÂÂ 2.412825]Â ? mfgpt_tick+0x5d/0x81
[ÂÂÂ 2.412845]Â __handle_irq_event_percpu+0x56/0xb6
[ÂÂÂ 2.412864]Â ? handle_fasteoi_irq+0xf3/0xf3
[ÂÂÂ 2.412878]Â handle_irq_event_percpu+0x17/0x3f
[ÂÂÂ 2.412892]Â handle_irq_event+0x1d/0x29
[ÂÂÂ 2.412905]Â handle_level_irq+0x57/0xc6
[ÂÂÂ 2.412924]Â handle_irq+0x47/0x52
[ÂÂÂ 2.412929]Â </IRQ>
[ÂÂÂ 2.412934]Â <SOFTIRQ>
[ÂÂÂ 2.412949]Â do_IRQ+0x32/0x9b
[ÂÂÂ 2.412963]Â ? __irqentry_text_end+0x7/0x7
[ÂÂÂ 2.412976]Â common_interrupt+0x2e/0x34
...

The problem seems to be in drivers/clocksource/cs5535-clockevt.c in
mfgpt_tick() on line 132:
...
ÂÂÂÂcs5535_clockevent.event_handler(&cs5535_clockevent);
...
cs5535_clockevent.event_handler is null.

Adding some more traces I see mfgpt_tick() gets called before
clockevents_config_and_register() finishes (invoked from
cs5535_mfgpt_init() on line 178). So when mfgpt_tick() accessess the
event_handler, it's NULL. Wrapping the event_handler call on line 132 in
a null pointer check results in a working system.

The issue is present at least also in kernel 4.13.5. In kernel versions
<= 4.1-rc6 the cs5535_clockevent worked OK. Kernels >= 4.1-rc7 never
booted at all on my ALIX 2c3 (last I tried 4.5, then gave up and tried
4.13.5 recently), so I don't know when exactly this issue appeared.

My guess is that the timer interrupt is enabled too early. If I change
the order of clockevents_config_and_register() and
cs5535_mfgpt_write(cs5535_event_clock, MFGPT_REG_SETUP, val) in
cs5535_mfgpt_init() like this:
...
ÂÂÂÂ/* Set up the clock event */
ÂÂÂÂprintk(KERN_INFO DRV_NAME
ÂÂÂÂÂÂÂ ": Registering MFGPT timer as a clock event, using IRQ %d\n",
ÂÂÂÂÂÂÂ timer_irq);
ÂÂÂÂclockevents_config_and_register(&cs5535_clockevent, MFGPT_HZ,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 0xF, 0xFFFE);

ÂÂÂÂ/* Set the clock scale and enable the event mode for CMP2 */
ÂÂÂÂval = MFGPT_SCALE | (3 << 8);
ÂÂÂÂcs5535_mfgpt_write(cs5535_event_clock, MFGPT_REG_SETUP, val);
...
the system boots and seems to be OK.

What happens if instead of inverting those two lines, you add
mfgpt_shutdown() early in the init function ?

Hi Daniel,

I can't call mfgpt_shutdown() before clockevents_config_and_register() as mfgpt_shutdown() wants a struct clock_event_device* which is only available after clockevents_config_and_register(). But I tried calling disable_timer():

@@ -152,6 +152,8 @@ static int __init cs5535_mfgpt_init(void)
}
cs5535_event_clock = timer;

+ disable_timer(cs5535_event_clock);
+
/* Set up the IRQ on the MFGPT side */
if (cs5535_mfgpt_setup_irq(timer, MFGPT_CMP2, &timer_irq)) {
printk(KERN_ERR DRV_NAME ": Could not set up IRQ %d\n",

which should do the same thing. Unfortunately I still get the same BUG.

I added some more traces and I think the IRQ comes just after:
ret = setup_irq(timer_irq, &mfgptirq);

(And my reordering of cs5535_mfgpt_write and clockevents_config_and_register was just lucky, because the confing and register occured sooner than the IRQ.)

Attached is a patch that shows where I added the traces. It produces the following output (I also added a bunch of sleeps in between to be sure which step triggers the mfgpt_tick call):

[ 3.229612] cs5535-clockevt: enabling verbose handler
[ 3.234699] cs5535-clockevt: sleeping(1)...
[ 4.320051] cs5535-clockevt: done(1)
[ 4.323641] cs5535-clockevt: calling cs5535_mfgpt_alloc_timer
[ 4.329402] cs5535-mfgpt cs5535-mfgpt: registered timer 1
[ 4.334835] cs5535-clockevt: sleeping(2)...
[ 5.360051] cs5535-clockevt: done(2)
[ 5.363632] cs5535-clockevt: calling disable_timer
[ 5.368425] cs5535-clockevt: sleeping(3)...
[ 6.400048] cs5535-clockevt: done(3)
[ 6.403631] cs5535-clockevt: calling cs5535_mfgpt_setup_irq
[ 6.409214] cs5535-clockevt: sleeping(4)...
[ 7.440044] cs5535-clockevt: done(4)
[ 7.443623] cs5535-clockevt: calling setup_irq
[ 7.448088] cs5535-clockevt: mfgpt_tick called
[ 7.452579] cs5535-clockevt: sleeping(5)...
[ 8.480049] cs5535-clockevt: done(5)
[ 8.483634] cs5535-clockevt: doing write
[ 8.487559] cs5535-clockevt: sleeping(6)...
[ 9.520051] cs5535-clockevt: done(6)
[ 9.523635] cs5535-clockevt: Registering MFGPT timer as a clock event, using IRQ 7
[ 9.531263] cs5535-clockevt: clockevents_config_and_register returned
[ 9.537706] cs5535-clockevt: disabling verbose handler

(I observed the boot 3 times and the sequence is always the same.)

Best regards,
Daviddiff --git a/drivers/clocksource/cs5535-clockevt.c b/drivers/clocksource/cs5535-clockevt.c
index a1df588343f2..446bb595030b 100644
--- a/drivers/clocksource/cs5535-clockevt.c
+++ b/drivers/clocksource/cs5535-clockevt.c
@@ -18,9 +18,12 @@
#include <linux/module.h>
#include <linux/cs5535.h>
#include <linux/clockchips.h>
+#include <linux/delay.h>

#define DRV_NAME "cs5535-clockevt"

+static int verbose_handler = 0;
+
static int timer_irq;
module_param_hw_named(irq, timer_irq, int, irq, 0644);
MODULE_PARM_DESC(irq, "Which IRQ to use for the clock source MFGPT ticks.");
@@ -129,7 +132,14 @@ static irqreturn_t mfgpt_tick(int irq, void *dev_id)
cs5535_mfgpt_write(cs5535_event_clock, MFGPT_REG_SETUP,
MFGPT_SETUP_CNTEN | MFGPT_SETUP_CMP2);

- cs5535_clockevent.event_handler(&cs5535_clockevent);
+ if (verbose_handler)
+ printk(KERN_INFO DRV_NAME ": mfgpt_tick called\n");
+
+ if (!verbose_handler && cs5535_clockevent.event_handler == NULL)
+ printk(KERN_ERR DRV_NAME ": mfgpt_tick with null event_handler while not verbose!\n");
+
+ if (cs5535_clockevent.event_handler != NULL)
+ cs5535_clockevent.event_handler(&cs5535_clockevent);
return IRQ_HANDLED;
}

@@ -145,6 +155,14 @@ static int __init cs5535_mfgpt_init(void)
int ret;
uint16_t val;

+ printk(KERN_INFO DRV_NAME ": enabling verbose handler\n");
+ WRITE_ONCE(verbose_handler, 1);
+
+ printk(KERN_INFO DRV_NAME ": sleeping(1)...\n");
+ msleep(1000);
+ printk(KERN_INFO DRV_NAME ": done(1)\n");
+
+ printk(KERN_INFO DRV_NAME ": calling cs5535_mfgpt_alloc_timer\n");
timer = cs5535_mfgpt_alloc_timer(MFGPT_TIMER_ANY, MFGPT_DOMAIN_WORKING);
if (!timer) {
printk(KERN_ERR DRV_NAME ": Could not allocate MFGPT timer\n");
@@ -152,6 +170,18 @@ static int __init cs5535_mfgpt_init(void)
}
cs5535_event_clock = timer;

+ printk(KERN_INFO DRV_NAME ": sleeping(2)...\n");
+ msleep(1000);
+ printk(KERN_INFO DRV_NAME ": done(2)\n");
+
+ printk(KERN_INFO DRV_NAME ": calling disable_timer\n");
+ disable_timer(cs5535_event_clock);
+
+ printk(KERN_INFO DRV_NAME ": sleeping(3)...\n");
+ msleep(1000);
+ printk(KERN_INFO DRV_NAME ": done(3)\n");
+
+ printk(KERN_INFO DRV_NAME ": calling cs5535_mfgpt_setup_irq\n");
/* Set up the IRQ on the MFGPT side */
if (cs5535_mfgpt_setup_irq(timer, MFGPT_CMP2, &timer_irq)) {
printk(KERN_ERR DRV_NAME ": Could not set up IRQ %d\n",
@@ -159,6 +189,11 @@ static int __init cs5535_mfgpt_init(void)
goto err_timer;
}

+ printk(KERN_INFO DRV_NAME ": sleeping(4)...\n");
+ msleep(1000);
+ printk(KERN_INFO DRV_NAME ": done(4)\n");
+
+ printk(KERN_INFO DRV_NAME ": calling setup_irq\n");
/* And register it with the kernel */
ret = setup_irq(timer_irq, &mfgptirq);
if (ret) {
@@ -166,17 +201,30 @@ static int __init cs5535_mfgpt_init(void)
goto err_irq;
}

+ printk(KERN_INFO DRV_NAME ": sleeping(5)...\n");
+ msleep(1000);
+ printk(KERN_INFO DRV_NAME ": done(5)\n");
+
/* Set the clock scale and enable the event mode for CMP2 */
val = MFGPT_SCALE | (3 << 8);

+ printk(KERN_INFO DRV_NAME ": doing write\n");
cs5535_mfgpt_write(cs5535_event_clock, MFGPT_REG_SETUP, val);

+ printk(KERN_INFO DRV_NAME ": sleeping(6)...\n");
+ msleep(1000);
+ printk(KERN_INFO DRV_NAME ": done(6)\n");
+
/* Set up the clock event */
printk(KERN_INFO DRV_NAME
": Registering MFGPT timer as a clock event, using IRQ %d\n",
timer_irq);
clockevents_config_and_register(&cs5535_clockevent, MFGPT_HZ,
0xF, 0xFFFE);
+ printk(KERN_INFO DRV_NAME ": clockevents_config_and_register returned\n");
+
+ printk(KERN_INFO DRV_NAME ": disabling verbose handler\n");
+ WRITE_ONCE(verbose_handler, 0);

return 0;