Re: Crashing 'kzm' target in next-20160913 due to 'gpio: mxc: shift gpio_mxc_init() to subsys_initcall level'
From: Uwe Kleine-König
Date: Thu Sep 15 2016 - 10:47:19 EST
On Thu, Sep 15, 2016 at 07:35:16AM -0700, Guenter Roeck wrote:
> On 09/15/2016 07:23 AM, Uwe Kleine-König wrote:
> > On Thu, Sep 15, 2016 at 04:35:04PM +0300, Vladimir Zapolskiy wrote:
> > > Hi Guenter,
> > >
> > > On 09/14/2016 06:20 AM, Guenter Roeck wrote:
> > > > Hi Vladimir,
> > > >
> > > > your commit e188cbf7564f ("gpio: mxc: shift gpio_mxc_init() to subsys_initcall level")
> > > > in -next causes the following crash when running the 'kzm' target (and most likely
> > > > the real thing) with qemu.
> > > >
> > > > [ 1.211426] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
> > > > [ 1.211600] pgd = c0004000
> > > > [ 1.211680] [0000000c] *pgd=00000000
> > > > [ 1.212067] Internal error: Oops: 5 [#1] SMP ARM
> > > > [ 1.212245] Modules linked in:
> > > > [ 1.212542] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc6-next-20160913 #1
> > > > [ 1.212671] Hardware name: Kyoto Microcomputer Co., Ltd. KZM-ARM11-01
> > > > [ 1.212825] task: c6848000 task.stack: c683e000
> > > > [ 1.213231] PC is at platform_get_irq+0xc0/0xe8
> > > >
> > > > See http://kerneltests.org/builders/qemu-arm-next/builds/525/steps/qemubuildcommand/logs/stdio
> > > > for a complete log.
> > > >
> > > > Problem is quite subtle. The change causes the gpio driver to be installed later.
> > > > As a result, kzm_init_smsc9118() fails to initialize the gpio pins correctly.
> > > > gpio_request() in that function returns -EPROBE_DEFER, which is ignored,
> > > > gpio_to_irq() then returns -22 which is unconditionally assigned as interrupt number.
> > > > platform_get_irq(), as called from the smsc driver, gets this negative interrupt
> > > > number, and passes it unconditionally to irq_get_irq_data(), which returns NULL.
> > > > The NULL pointer is then passed to irqd_set_trigger_type() which, not entirely
> > > > surprisingly, crashes.
> > > >
> > > > So, in other words, lots of bugs here. Nevertheless, I would suggest to keep using
> > > > postcore_initcall(), at least until it is sure that all gpio clients handle -EPROBE_DEFER
> > > > correctly.
> > >
> > > I'm inviting Shawn and Uwe to the discussion.
> > >
> > > The proper fix in this particular case should be like this one:
> > >
> > > diff --git a/arch/arm/mach-imx/mach-kzm_arm11_01.c b/arch/arm/mach-imx/mach-kzm_arm11_01.c
> > > index 31df4361996f..8288acfe7221 100644
> > > --- a/arch/arm/mach-imx/mach-kzm_arm11_01.c
> > > +++ b/arch/arm/mach-imx/mach-kzm_arm11_01.c
> > > @@ -245,13 +245,17 @@ static void __init kzm_board_init(void)
> > > mxc_iomux_setup_multiple_pins(kzm_pins,
> > > ARRAY_SIZE(kzm_pins), "kzm");
> > > - kzm_init_ext_uart();
> > > - kzm_init_smsc9118();
> > > kzm_init_imx_uart();
> > > pr_info("Clock input source is 26MHz\n");
> > > }
> > > +static void __init kzm_late_init(void)
> > > +{
> > > + kzm_init_ext_uart();
> > > + kzm_init_smsc9118();
> > > +}
> > > +
> > > /*
> > > * This structure defines static mappings for the kzm-arm11-01 board.
> > > */
> > > @@ -291,5 +295,6 @@ MACHINE_START(KZM_ARM11_01, "Kyoto Microcomputer Co., Ltd. KZM-ARM11-01")
> > > .init_irq = mx31_init_irq,
> > > .init_time = kzm_timer_init,
> > > .init_machine = kzm_board_init,
> > > + .init_late = kzm_late_init,
> > > .restart = mxc_restart,
> > > MACHINE_END
> >
> > That + checking the return code of gpio_request and the other calls.
> > Or better, convert the machine to dt.
> >
> > > But I agree that there might be more legacy boards (i.MX31 only IMHO),
> > > which may attempt to manipulate GPIO lines before subsys_initcall()
> > > level.
> >
> > I wouldn't revert anything for legacy boards. That's the chance to say
> > in the near future: They stopped working in September 2016, obviously
> > nobody cares, let's rip them. :-)
> >
> New kernel development philosophy ? Regressions are acceptable as long as
> they affect a board older than X years ? What is your cut-off date for accepting
> regressions like that ?
I would soften your statement about regressions to: Regressions like
these are an incentive to pre-dt platforms to modernize. Of course they
should ideally be prevented, and fixed when they happen, but I wouldn't
start reverting gpio-changes because of a regression on a machine that
is hardly used with modern kernel (ok, that's an assumption, but it
didn't receive enough love to be converted to dt), that fails to even
check return values of gpio_request and that can be easily fixed.
Best regards
Uwe
--
Pengutronix e.K. | Uwe Kleine-König |
Industrial Linux Solutions | http://www.pengutronix.de/ |