Re: PROBLEM: Kernel BUG in mfgpt_tick (cs5535-clockevt.c) on ALIX 2c3 - null call

From: Daniel Lezcano
Date: Mon Oct 09 2017 - 04:39:23 EST


On 07/10/2017 23:26, David Kozub wrote:
> Hi all,
>
> booting up kernel 4.14-rc3 with CS5535_CLOCK_EVENT_SRC on an ALIX 2c3
> (http://pcengines.ch/alix2c3.htm) dies with:
>
> [ÂÂÂ 2.313086] cs5535-smb cs5535-smb: SCx200 device 'CS5535 ACB0'
> registered
> [ÂÂÂ 2.338711] cs5535-mfgpt cs5535-mfgpt: registered timer 0
> [ÂÂÂ 2.355676] ledtrig-cpu: registered to indicate activity on CPUs
> [ÂÂÂ 2.373745] cs5535-mfgpt cs5535-mfgpt: registered timer 1
> [ÂÂÂ 2.389976] cs5535-clockevt: Registering MFGPT timer as a clock
> event, using IRQ 7
> [ÂÂÂ 2.412687] BUG: unable to handle kernel NULL pointer dereference
> atÂÂ (null)
> [ÂÂÂ 2.412698] IP:ÂÂ (null)
> [ÂÂÂ 2.412702] *pde = 00000000
> [ÂÂÂ 2.412713] Oops: 0000 [#1]
> [ÂÂÂ 2.412716] Modules linked in:
> [ÂÂÂ 2.412732] CPU: 0 PID: 1 Comm: swapper Not tainted
> 4.14.0-rc3-humel-test17 #36
> [ÂÂÂ 2.412739] task: c0010000 task.stack: c008e000
> [ÂÂÂ 2.412744] EIP:ÂÂ (null)
> [ÂÂÂ 2.412749] EFLAGS: 00210093 CPU: 0
> [ÂÂÂ 2.412758] EAX: c05fb8c0 EBX: c05fb880 ECX: 00000000 EDX: 0000620c
> [ÂÂÂ 2.412769] ESI: c0009fd4 EDI: c014e14e EBP: c0009fac ESP: c0009fa8
> [ÂÂÂ 2.412780]Â DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ÂÂÂ 2.412790] CR0: 80050033 CR2: 00000000 CR3: 0064f000 CR4: 00000090
> [ÂÂÂ 2.412793] Call Trace:
> [ÂÂÂ 2.412800]Â <IRQ>
> [ÂÂÂ 2.412825]Â ? mfgpt_tick+0x5d/0x81
> [ÂÂÂ 2.412845]Â __handle_irq_event_percpu+0x56/0xb6
> [ÂÂÂ 2.412864]Â ? handle_fasteoi_irq+0xf3/0xf3
> [ÂÂÂ 2.412878]Â handle_irq_event_percpu+0x17/0x3f
> [ÂÂÂ 2.412892]Â handle_irq_event+0x1d/0x29
> [ÂÂÂ 2.412905]Â handle_level_irq+0x57/0xc6
> [ÂÂÂ 2.412924]Â handle_irq+0x47/0x52
> [ÂÂÂ 2.412929]Â </IRQ>
> [ÂÂÂ 2.412934]Â <SOFTIRQ>
> [ÂÂÂ 2.412949]Â do_IRQ+0x32/0x9b
> [ÂÂÂ 2.412963]Â ? __irqentry_text_end+0x7/0x7
> [ÂÂÂ 2.412976]Â common_interrupt+0x2e/0x34
> ...
>
> The problem seems to be in drivers/clocksource/cs5535-clockevt.c in
> mfgpt_tick() on line 132:
> ...
> ÂÂÂÂcs5535_clockevent.event_handler(&cs5535_clockevent);
> ...
> cs5535_clockevent.event_handler is null.
>
> Adding some more traces I see mfgpt_tick() gets called before
> clockevents_config_and_register() finishes (invoked from
> cs5535_mfgpt_init() on line 178). So when mfgpt_tick() accessess the
> event_handler, it's NULL. Wrapping the event_handler call on line 132 in
> a null pointer check results in a working system.
>
> The issue is present at least also in kernel 4.13.5. In kernel versions
> <= 4.1-rc6 the cs5535_clockevent worked OK. Kernels >= 4.1-rc7 never
> booted at all on my ALIX 2c3 (last I tried 4.5, then gave up and tried
> 4.13.5 recently), so I don't know when exactly this issue appeared.
>
> My guess is that the timer interrupt is enabled too early. If I change
> the order of clockevents_config_and_register() and
> cs5535_mfgpt_write(cs5535_event_clock, MFGPT_REG_SETUP, val) in
> cs5535_mfgpt_init() like this:
> ...
> ÂÂÂÂ/* Set up the clock event */
> ÂÂÂÂprintk(KERN_INFO DRV_NAME
> ÂÂÂÂÂÂÂ ": Registering MFGPT timer as a clock event, using IRQ %d\n",
> ÂÂÂÂÂÂÂ timer_irq);
> ÂÂÂÂclockevents_config_and_register(&cs5535_clockevent, MFGPT_HZ,
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 0xF, 0xFFFE);
>
> ÂÂÂÂ/* Set the clock scale and enable the event mode for CMP2 */
> ÂÂÂÂval = MFGPT_SCALE | (3 << 8);
> ÂÂÂÂcs5535_mfgpt_write(cs5535_event_clock, MFGPT_REG_SETUP, val);
> ...
> the system boots and seems to be OK.

What happens if instead of inverting those two lines, you add
mfgpt_shutdown() early in the init function ?


--
<http://www.linaro.org/> Linaro.org â Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog