Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
From: Saravana Kannan
Date: Tue May 19 2020 - 02:48:46 EST
On Mon, May 18, 2020 at 11:25 PM Marek Szyprowski
<m.szyprowski@xxxxxxxxxxx> wrote:
>
> Hi Saravana,
>
> On 15.05.2020 07:35, Saravana Kannan wrote:
> > The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
> > parsing of the device tree nodes when a lot of devices are added. This
> > will significantly cut down parsing time (as much a 1 second on some
> > systems). So, use them when adding devices for all the top level device
> > tree nodes in a system.
> >
> > Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx>
>
> This patch recently landed in linux-next 20200518. Sadly, it causes
> regression on Samsung Exynos5433-based TM2e board:
>
> s3c64xx-spi 14d30000.spi: Failed to get RX DMA channel
> s3c64xx-spi 14d50000.spi: Failed to get RX DMA channel
> s3c64xx-spi 14d30000.spi: Failed to get RX DMA channel
> s3c64xx-spi 14d50000.spi: Failed to get RX DMA channel
> s3c64xx-spi 14d30000.spi: Failed to get RX DMA channel
>
> Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 0 PID: 50 Comm: kworker/0:1 Not tainted 5.7.0-rc5+ #701
> Hardware name: Samsung TM2E board (DT)
> Workqueue: events deferred_probe_work_func
> pstate: 60000005 (nZCv daif -PAN -UAO)
> pc : samsung_i2s_probe+0x768/0x8f0
> lr : samsung_i2s_probe+0x688/0x8f0
> ...
> Call trace:
> samsung_i2s_probe+0x768/0x8f0
> platform_drv_probe+0x50/0xa8
> really_probe+0x108/0x370
> driver_probe_device+0x54/0xb8
> __device_attach_driver+0x90/0xc0
> bus_for_each_drv+0x70/0xc8
> __device_attach+0xdc/0x140
> device_initial_probe+0x10/0x18
> bus_probe_device+0x94/0xa0
> deferred_probe_work_func+0x70/0xa8
> process_one_work+0x2a8/0x718
> worker_thread+0x48/0x470
> kthread+0x134/0x160
> ret_from_fork+0x10/0x1c
> Code: 17ffffaf d503201f f94086c0 91003000 (88dffc00)
> ---[ end trace ccf721c9400ddbd6 ]---
> Kernel panic - not syncing: Fatal exception
> SMP: stopping secondary CPUs
> Kernel Offset: disabled
> CPU features: 0x090002,24006087
> Memory Limit: none
>
> ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> Both issues, the lack of DMA for SPI device and Synchronous abort in I2S
> probe are new after applying this patch. I'm trying to investigate which
> resources are missing and why. The latter issue means typically that the
> registers for the given device has been accessed without enabling the
> needed clocks or power domains.
Did you try this copy-pasta fix that I sent later?
https://lore.kernel.org/lkml/20200517173453.157703-1-saravanak@xxxxxxxxxx/
Not every system would need it (my test setup didn't), but it helps some cases.
If that fix doesn't help, then some tips for debugging the failing drivers.
What this pause/resume patch effectively (not explicitly) does is:
1. Doesn't immediately probe the devices as they are added in
of_platform_default_populate_init()
2. Adds them in order to the deferred probe list.
3. Then kicks off deferred probe on them in the order they were added.
These drivers are just not handling -EPROBE_DEFER correctly or
assuming probe order and that's causing these issues.
So, we can either fix that or you can try adding some code to flush
the deferred probe workqueue at the end of fw_devlink_resume().
Let me know how it goes.
Thanks,
Saravana