Re: Crashing 'kzm' target in next-20160913 due to 'gpio: mxc: shift gpio_mxc_init() to subsys_initcall level'

From: Guenter Roeck
Date: Thu Sep 15 2016 - 22:24:21 EST


On 09/15/2016 07:46 AM, Uwe Kleine-König wrote:
On Thu, Sep 15, 2016 at 07:35:16AM -0700, Guenter Roeck wrote:
On 09/15/2016 07:23 AM, Uwe Kleine-König wrote:
On Thu, Sep 15, 2016 at 04:35:04PM +0300, Vladimir Zapolskiy wrote:
Hi Guenter,

On 09/14/2016 06:20 AM, Guenter Roeck wrote:
Hi Vladimir,

your commit e188cbf7564f ("gpio: mxc: shift gpio_mxc_init() to subsys_initcall level")
in -next causes the following crash when running the 'kzm' target (and most likely
the real thing) with qemu.

[ 1.211426] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
[ 1.211600] pgd = c0004000
[ 1.211680] [0000000c] *pgd=00000000
[ 1.212067] Internal error: Oops: 5 [#1] SMP ARM
[ 1.212245] Modules linked in:
[ 1.212542] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc6-next-20160913 #1
[ 1.212671] Hardware name: Kyoto Microcomputer Co., Ltd. KZM-ARM11-01
[ 1.212825] task: c6848000 task.stack: c683e000
[ 1.213231] PC is at platform_get_irq+0xc0/0xe8

See http://kerneltests.org/builders/qemu-arm-next/builds/525/steps/qemubuildcommand/logs/stdio
for a complete log.

Problem is quite subtle. The change causes the gpio driver to be installed later.
As a result, kzm_init_smsc9118() fails to initialize the gpio pins correctly.
gpio_request() in that function returns -EPROBE_DEFER, which is ignored,
gpio_to_irq() then returns -22 which is unconditionally assigned as interrupt number.
platform_get_irq(), as called from the smsc driver, gets this negative interrupt
number, and passes it unconditionally to irq_get_irq_data(), which returns NULL.
The NULL pointer is then passed to irqd_set_trigger_type() which, not entirely
surprisingly, crashes.

So, in other words, lots of bugs here. Nevertheless, I would suggest to keep using
postcore_initcall(), at least until it is sure that all gpio clients handle -EPROBE_DEFER
correctly.

I'm inviting Shawn and Uwe to the discussion.

The proper fix in this particular case should be like this one:

diff --git a/arch/arm/mach-imx/mach-kzm_arm11_01.c b/arch/arm/mach-imx/mach-kzm_arm11_01.c
index 31df4361996f..8288acfe7221 100644
--- a/arch/arm/mach-imx/mach-kzm_arm11_01.c
+++ b/arch/arm/mach-imx/mach-kzm_arm11_01.c
@@ -245,13 +245,17 @@ static void __init kzm_board_init(void)
mxc_iomux_setup_multiple_pins(kzm_pins,
ARRAY_SIZE(kzm_pins), "kzm");
- kzm_init_ext_uart();
- kzm_init_smsc9118();
kzm_init_imx_uart();
pr_info("Clock input source is 26MHz\n");
}
+static void __init kzm_late_init(void)
+{
+ kzm_init_ext_uart();
+ kzm_init_smsc9118();
+}
+
/*
* This structure defines static mappings for the kzm-arm11-01 board.
*/
@@ -291,5 +295,6 @@ MACHINE_START(KZM_ARM11_01, "Kyoto Microcomputer Co., Ltd. KZM-ARM11-01")
.init_irq = mx31_init_irq,
.init_time = kzm_timer_init,
.init_machine = kzm_board_init,
+ .init_late = kzm_late_init,
.restart = mxc_restart,
MACHINE_END

That + checking the return code of gpio_request and the other calls.
Or better, convert the machine to dt.

But I agree that there might be more legacy boards (i.MX31 only IMHO),
which may attempt to manipulate GPIO lines before subsys_initcall()
level.

I wouldn't revert anything for legacy boards. That's the chance to say
in the near future: They stopped working in September 2016, obviously
nobody cares, let's rip them. :-)

New kernel development philosophy ? Regressions are acceptable as long as
they affect a board older than X years ? What is your cut-off date for accepting
regressions like that ?

I would soften your statement about regressions to: Regressions like
these are an incentive to pre-dt platforms to modernize. Of course they
should ideally be prevented, and fixed when they happen, but I wouldn't
start reverting gpio-changes because of a regression on a machine that
is hardly used with modern kernel (ok, that's an assumption, but it
didn't receive enough love to be converted to dt), that fails to even
check return values of gpio_request and that can be easily fixed.

All 890 of them ? Really ?

Guenter