Re: 4.18.0-rc1-next-20180619 boot failed on beagle board x15

From: Samuel Morris
Date: Thu Jun 21 2018 - 10:32:28 EST


On Thu, Jun 21, 2018 at 3:58 AM, Roger Quadros <rogerq@xxxxxx> wrote:
> +Rafael
>
> On 20/06/18 18:30, Samuel Morris wrote:
>> On Wed, Jun 20, 2018 at 8:58 AM, Roger Quadros <rogerq@xxxxxx> wrote:
>>> Tony,
>>>
>>> On 20/06/18 13:29, Tony Lindgren wrote:
>>>> Hi,
>>>>
>>>> * Naresh Kamboju <naresh.kamboju@xxxxxxxxxx> [180620 05:55]:
>>>>> Linux next (4.18.0-rc1-next-20180619) boot failed on beagle board x15.
>>>>
>>>> Bisect points to commit aece27a2f01b ("ata: ahci_platform: allow disabling of
>>>> hotplug to save power").
>>>>
>>>> Reverting the patch makes things work again. Any ideas what
>>>> might be going wrong here? Things clearly idle but then there
>>>> seems to be some register access with clocks disabled.
>>>
>>> The commit is doing this in probe.
>>>
>>> + pm_runtime_set_active(dev);
>>> pm_runtime_enable(dev);
>>> - pm_runtime_get_sync(dev);
>>> + pm_runtime_forbid(dev);
>>>
>>> On OMAP, the device is not guaranteed to be active at probe and so we can't
>>> say pm_runtime_set_active() and get rid of pm_runtime_get_sync().
>>
>> Okay, by calling set_active(), I'm preventing the rpm_resume from
>> completing that would normally happen in pm_runtime_forbid(). I assume
>> you mean that there are parent devices that need to be resumed before
>
> Actually, in the OMAP case, the AHCI controller device isn't active when probe is called.
> For other platforms this might not be the case. So we need to be careful here.
>
>> this device may be assumed active. I'm going to try removing the
>> set_active(), then move that clause to the end of
>> ahci_platform_init_host(). The pm_runtime_forbid() is effectively the
>> same as get_sync(), it just also sets the runtime_auto flag to false.
>> I don't think we should be saying the device is active until the host
>> is initialized, so that seems like a better, common place for the
>> pm_runtime init callbacks anyway. How does that sound?
>
> Device active and initialized are different things. If the device is powered up
> and can be accessed it is active, even if it is not yet initialized.
> I don't think we should club the two.
>
> Why do you need to call pm_runtime_set_active() at all in the probe sequence?

What I meant was, I'm not going to call pm_runtime_set_active()
anymore. The device status would be changed to RPM_ACTIVE at the end
of rpm_resume() in either runtime_get() or runtime_forbid(). It is
expected to be active after those have been called. Calling
set_active() before the runtime_forbid() before was to prevent the
device from actually calling the resume() callbacks, because the host
had not yet been initialized, and resuming the host at that point was
causing hangs. I was hoping that actually doing a runtime_resume()
after the host has been initialized would work, and make all parent
devices happy, though I am turning on a host that's already on and
initialized. It should be just about the same as what was there
before, just runtime_forbid() replacing get_sync(), and all that moved
to the ahci_platform_init_host() function.

Now that I think about it though, I want all the parent devices
resumed before I resume the current device, and probably before host
initialization, and I suspect that's why you need to call
runtime_get_sync(). Maybe I should just put the
ahci_platform_init_host() in the runtime_resume callback, and only run
it the first time that resume() is called. That would require
modifying all drivers that call ahci_platform_init_host() though. Or,
I could just not call ahci_platform_resume_host() on the first
runtime_resume(). That seems like the simplest solution. How about
that?

>
> Documentation for pm_runtime_set_active() says,
> "(it is only valid to use this function if 'power.runtime_error' is set
> or 'power.disable_depth' is greater than zero);"
>
>>
>>>
>>>
>>>>
>>>>> Full log of trace please refer this link,
>>>>> https://lkft.validation.linaro.org/scheduler/job/291535#L946
>>>>>
>>>>> Here is back trace,
>>>>> [ 3.561640] omap_rng 48090000.rng: Random Number Generator ver. 20
>>>>> [ 3.611949] brd: module loaded
>>>>> [ 3.641298] loop: module loaded
>>>>> [ 3.646435] at24 0-0050: 4096 byte 24c32 EEPROM, writable, 1 bytes/write
>>>>> [ 3.658481] ------------[ cut here ]------------
>>>>> [ 3.663196] WARNING: CPU: 0 PID: 1 at
>>>>> /srv/oe/build/tmp-rpb-glibc/work-shared/am57xx-evm/kernel-source/drivers/bus/omap_l3_noc.c:147
>>>>> l3_interrupt_handler+0x27c/0x39c
>>>>> [ 3.678052] 44000000.ocp:L3 Custom Error: MASTER MPU TARGET L4_CFG
>>>>> (Read): Data Access in User mode during Functional access
>>>>> [ 3.689330] Modules linked in:
>>>>> [ 3.692426] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
>>>>> 4.18.0-rc1-next-20180619 #1
>>>>> [ 3.699952] Hardware name: Generic DRA74X (Flattened Device Tree)
>>>>> [ 3.706099] [<c04156f0>] (unwind_backtrace) from [<c040eec4>]
>>>>> (show_stack+0x20/0x24)
>>>>> [ 3.713897] [<c040eec4>] (show_stack) from [<c116a668>]
>>>>> (dump_stack+0xb0/0xdc)
>>>>> [ 3.721171] [<c116a668>] (dump_stack) from [<c0450a14>] (__warn+0x118/0x130)
>>>>> [ 3.728270] [<c0450a14>] (__warn) from [<c0450a84>]
>>>>> (warn_slowpath_fmt+0x58/0x74)
>>>>> [ 3.735808] [<c0450a84>] (warn_slowpath_fmt) from [<c08be648>]
>>>>> (l3_interrupt_handler+0x27c/0x39c)
>>>>> [ 3.744742] [<c08be648>] (l3_interrupt_handler) from [<c04cd270>]
>>>>> (__handle_irq_event_percpu+0x90/0x45c)
>>>>> [ 3.754286] [<c04cd270>] (__handle_irq_event_percpu) from
>>>>> [<c04cd678>] (handle_irq_event_percpu+0x3c/0x8c)
>>>>> [ 3.764001] [<c04cd678>] (handle_irq_event_percpu) from
>>>>> [<c04cd710>] (handle_irq_event+0x48/0x6c)
>>>>> [ 3.772933] [<c04cd710>] (handle_irq_event) from [<c04d170c>]
>>>>> (handle_fasteoi_irq+0xb0/0x164)
>>>>> [ 3.781513] [<c04d170c>] (handle_fasteoi_irq) from [<c04cc344>]
>>>>> (generic_handle_irq+0x34/0x44)
>>>>> [ 3.790183] [<c04cc344>] (generic_handle_irq) from [<c04cca48>]
>>>>> (__handle_domain_irq+0x6c/0xc4)
>>>>> [ 3.798941] [<c04cca48>] (__handle_domain_irq) from [<c0402528>]
>>>>> (gic_handle_irq+0x68/0xac)
>>>>> [ 3.807347] [<c0402528>] (gic_handle_irq) from [<c04019bc>]
>>>>> (__irq_svc+0x5c/0x7c)
>>>>> [ 3.814874] Exception stack(0xee179c70 to 0xee179cb8)
>>>>> [ 3.819963] 9c60: fc140000
>>>>> ee3e2f10 ee3a9810 00000000
>>>>> [ 3.828189] 9c80: 00000000 ee3e2f10 fc140004 fc140000 ee3a9800
>>>>> ee3a9810 00000000 ee179cdc
>>>>> [ 3.836414] 9ca0: ee179ce0 ee179cc0 c0c3b364 c0c3b2ec 60000013 ffffffff
>>>>> [ 3.843079] [<c04019bc>] (__irq_svc) from [<c0c3b2ec>]
>>>>> (ahci_enable_ahci+0x24/0x70)
>>>>> [ 3.850789] [<c0c3b2ec>] (ahci_enable_ahci) from [<c0c3b364>]
>>>>> (ahci_save_initial_config+0x2c/0x388)
>>>>> [ 3.859894] [<c0c3b364>] (ahci_save_initial_config) from
>>>>> [<c0c3cbdc>] (ahci_platform_init_host+0x9c/0x4f4)
>>>>> [ 3.869609] [<c0c3cbdc>] (ahci_platform_init_host) from
>>>>> [<c0c3c33c>] (ahci_probe+0xa0/0xc0)
>>>>> [ 3.878018] [<c0c3c33c>] (ahci_probe) from [<c0ba40f0>]
>>>>> (platform_drv_probe+0x58/0xac)
>>>>> [ 3.885989] [<c0ba40f0>] (platform_drv_probe) from [<c0ba1aec>]
>>>>> (driver_probe_device+0x25c/0x338)
>>>>> [ 3.894915] [<c0ba1aec>] (driver_probe_device) from [<c0ba1cbc>]
>>>>> (__driver_attach+0xf4/0xf8)
>>>>> [ 3.903411] [<c0ba1cbc>] (__driver_attach) from [<c0b9f9ac>]
>>>>> (bus_for_each_dev+0x78/0xc4)
>>>>> [ 3.911643] [<c0b9f9ac>] (bus_for_each_dev) from [<c0ba13b8>]
>>>>> (driver_attach+0x2c/0x30)
>>>>> [ 3.919696] [<c0ba13b8>] (driver_attach) from [<c0ba0cfc>]
>>>>> (bus_add_driver+0x114/0x220)
>>>>> [ 3.927752] [<c0ba0cfc>] (bus_add_driver) from [<c0ba2e2c>]
>>>>> (driver_register+0x88/0x11c)
>>>>> [ 3.935896] [<c0ba2e2c>] (driver_register) from [<c0ba4044>]
>>>>> (__platform_driver_register+0x50/0x58)
>>>>> [ 3.945003] [<c0ba4044>] (__platform_driver_register) from
>>>>> [<c1aa9cf0>] (ahci_driver_init+0x24/0x28)
>>>>> [ 3.954195] [<c1aa9cf0>] (ahci_driver_init) from [<c0403a8c>]
>>>>> (do_one_initcall+0x9c/0x434)
>>>>> [ 3.962517] [<c0403a8c>] (do_one_initcall) from [<c1a01410>]
>>>>> (kernel_init_freeable+0x3b8/0x53c)
>>>>> [ 3.971276] [<c1a01410>] (kernel_init_freeable) from [<c118551c>]
>>>>> (kernel_init+0x18/0x128)
>>>>> [ 3.979597] [<c118551c>] (kernel_init) from [<c04010ac>]
>>>>> (ret_from_fork+0x14/0x28)
>>>>> [ 3.987210] Exception stack(0xee179fb0 to 0xee179ff8)
>>>>> [ 3.992298] 9fa0: 00000000
>>>>> 00000000 00000000 00000000
>>>>> [ 4.000524] 9fc0: 00000000 00000000 00000000 00000000 00000000
>>>>> 00000000 00000000 00000000
>>>>> [ 4.008750] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>>>> [ 4.015405] irq event stamp: 663130
>>>>> [ 4.018926] hardirqs last enabled at (663129): [<c118d9d4>]
>>>>> _raw_spin_unlock_irqrestore+0x40/0x6c
>>>>> [ 4.027939] hardirqs last disabled at (663130): [<c04019ac>]
>>>>> __irq_svc+0x4c/0x7c
>>>>> [ 4.035384] softirqs last enabled at (661486): [<c0402c98>]
>>>>> __do_softirq+0x2a8/0x5e8
>>>>> [ 4.043262] softirqs last disabled at (661449): [<c04594a4>]
>>>>> irq_exit+0x178/0x1a4
>>>>> [ 4.050826] ---[ end trace 095abac70926448f ]---
>>>>>
>>>>> Best regards
>>>>> Naresh Kamboju
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> cheers,
>>> -roger
>>>
>>> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
>>> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
>
> --
> cheers,
> -roger
>
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki