Re: [PATCH] mtd: spi-nor: Prefer asynchronous probe
From: Doug Anderson
Date: Sat Oct 03 2020 - 13:00:50 EST
Hi,
On Sat, Oct 3, 2020 at 9:54 AM Michael Walle <michael@xxxxxxxx> wrote:
>
> Hi Douglas,
>
> Am 2020-10-03 18:27, schrieb Doug Anderson:
> > Hi,
> >
> > On Sat, Oct 3, 2020 at 8:22 AM Michael Walle <michael@xxxxxxxx> wrote:
> >>
> >> Hi Douglas,
> >>
> >> > On my system the spi_nor_probe() took ~6 ms at bootup. That's not a
> >> > lot, but every little bit adds up to a slow bootup. While we can get
> >> > this out of the boot path by making it a module, there are times where
> >> > it is convenient (or even required) for this to be builtin the kernel.
> >> > Let's set that we prefer async probe so that we don't block other
> >> > drivers from probing while we are probing.
> >> >
> >> > This is a tiny little change that is almost guaranteed to be safe for
> >> > anything that is able to run as a module, which SPI_NOR is.
> >> > Specifically modules are already probed asynchronously. Also: since
> >> > other things in the system may have enabled asynchronous probe the
> >> > system may already be doing other things during our probe.
> >> >
> >> > There is a small possibility that some other driver that was a client
> >> > of SPI_NOR didn't handle -EPROBE_DEFER and was relying on probe
> >> > ordering and only worked when the SPI_NOR and the SPI bus were
> >> > builtin. In that case the other driver has a bug that's waiting to
> >> > hit and the other driver should be fixed.
> >>
> >> linux-next now triggers the following warning in kernel/kmod.c:136 on
> >> my
> >> board. I've bisected this to this patch.
> >>
> >> kmod.c:
> >> /*
> >> * We don't allow synchronous module loading from async.
> >> Module
> >> * init may invoke async_synchronize_full() which will end up
> >> * waiting for this task which already is waiting for the
> >> module
> >> * loading to complete, leading to a deadlock.
> >> */
> >> WARN_ON_ONCE(wait && current_is_async());
> >>
> >> [ 1.849801] ------------[ cut here ]------------
> >> [ 1.854271] mscc_felix 0000:00:00.5: device is disabled, skipping
> >> [ 1.858753] WARNING: CPU: 1 PID: 7 at kernel/kmod.c:136
> >> __request_module+0x3a4/0x568
> >> [ 1.858755] Modules linked in:
> >> [ 1.865028] fsl_enetc 0000:00:00.0: Adding to iommu group 1
> >> [ 1.872640] CPU: 1 PID: 7 Comm: kworker/u4:0 Not tainted
> >> 5.9.0-rc6-00001-g03edda0e1eda #113
> >> [ 1.872642] Hardware name: Kontron SMARC-sAL28 (Single PHY) on
> >> SMARC Eval 2.0 carrier (DT)
> >> [ 1.872647] Workqueue: events_unbound async_run_entry_fn
> >> [ 1.876013] spi-nor spi0.0: w25q32dw (4096 Kbytes)
> >> [ 1.881294] pstate: 00000005 (nzcv daif -PAN -UAO BTYPE=--)
> >> [ 1.881297] pc : __request_module+0x3a4/0x568
> >> [ 1.881299] lr : __request_module+0x39c/0x568
> >> [ 1.881302] sp : ffff8000113a3920
> >> [ 1.925739] x29: ffff8000113a3920 x28: ffff800010c7b000
> >> [ 1.931068] x27: ffff00207ae05648 x26: ffff800010a41a88
> >> [ 1.936397] x25: 0000000000000000 x24: 0000000000000000
> >> [ 1.941727] x23: ffff800010c35140 x22: 0000000000000001
> >> [ 1.947055] x21: ffff800011149948 x20: ffff800010615bdc
> >> [ 1.952383] x19: 00000000ffffffff x18: 0000000000000000
> >> [ 1.957447] fsl_enetc 0000:00:00.0: enabling device (0400 -> 0402)
> >> [ 1.957711] x17: ffff800010a3e618 x16: ffff800010a3e5f8
> >> [ 1.964175] libphy: Freescale ENETC MDIO Bus: probed
> >> [ 1.969238] x15: ffffffffffffffff x14: ffff800011149948
> >> [ 1.969241] x13: ffff8000113a3918 x12: 0000000000000018
> >> [ 1.969245] x11: 0000000000000005 x10: 0101010101010101
> >> [ 1.975241] 10 fixed-partitions partitions found on MTD device
> >> 20c0000.spi
> >> [ 1.979550] x9 : ffff80001005f6a4 x8 : 0000000000000000
> >> [ 1.979553] x7 : 606f2c6364776865 x6 : 05041c090d431511
> >> [ 1.979556] x5 : 1115430d091c0405 x4 : 0000000000000000
> >> [ 1.979558] x3 : 6dac8d8d2dccae00 x2 : ffff800010c956e8
> >> [ 1.979561] x1 : ffff80001005fa58 x0 : 0000000000000001
> >> [ 1.979564] Call trace:
> >> [ 1.979571] __request_module+0x3a4/0x568
> >> [ 1.984914] Creating 10 MTD partitions on "20c0000.spi":
> >> [ 1.990227] parse_mtd_partitions+0x2ec/0x3c0
> >> [ 1.990232] mtd_device_parse_register+0xdc/0x1c8
> >> [ 1.997133] 0x000000000000-0x000000010000 : "rcw"
> >> [ 2.002454] spi_nor_probe+0x29c/0x2f0
> >> [ 2.002458] spi_mem_probe+0x74/0xb0
> >> [ 2.017759] 0x000000010000-0x000000100000 : "failsafe bootloader"
> >> [ 2.018433] spi_drv_probe+0x88/0xe8
> >> [ 2.018439] really_probe+0xec/0x3c0
> >> [ 2.033744] 0x000000100000-0x000000140000 : "failsafe DP firmware"
> >> [ 2.035555] driver_probe_device+0x60/0xc0
> >> [ 2.035559] __device_attach_driver+0x8c/0xd0
> >> [ 2.040455] 0x000000140000-0x0000001e0000 : "failsafe trusted
> >> firmware"
> >> [ 2.044642] bus_for_each_drv+0x84/0xd8
> >> [ 2.044645] __device_attach_async_helper+0xc4/0xe8
> >> [ 2.044648] async_run_entry_fn+0x4c/0x150
> >> [ 2.044653] process_one_work+0x1f4/0x4b8
> >> [ 2.057751] 0x0000001e0000-0x000000200000 : "reserved"
> >> [ 2.062814] worker_thread+0x50/0x480
> >> [ 2.062817] kthread+0x160/0x168
> >> [ 2.062821] ret_from_fork+0x10/0x34
> >> [ 2.073748] 0x000000200000-0x000000210000 : "configuration store"
> >> [ 2.076185] ---[ end trace 44224cc02e4e53d2 ]---
> >>
> >> -michael
> >
> > Thanks for your report! My vote would be to revert my patch and then
> > this would need to be resolved before it could be added back in.
> > Without doing tons of research, maybe the right answer here is that
> > mtd_device_parse_register() should be moved into a separate task so
> > it's not blocking probe? I probably won't try to tackle this
> > immediately, but the eventual goal is that async is default, so I
> > think this would need to be resolved before then.
>
> Ok. Vignesh, will you take care of that?
>
> While debugging another issue I also noticed that sometimes my
> /dev/mtdN devices were reordered. Note that I have two SPI flashes.
> Might this also be connected to the async probe?
It's likely. My guess is that you shouldn't really be depending on
the numbering. If you need to depend on the numbering, there should
be something that guarantees it like a device tree alias. We have
struggled with similar things on MMC for years and I guess Ulf finally
decided that we weren't going to get a better solution than the device
tree aliases. See:
https://lore.kernel.org/r/CAPDyKFoNFEa4ssv3TXbj4KM0UGjJgcp3dtypwicxJ-bExBL28A@xxxxxxxxxxxxxx
...for some details.
-Doug