Re: [PATCH 2/2] regulator: fixed: Set PROBE_PREFER_ASYNCHRONOUS
From: Doug Anderson
Date: Tue Mar 14 2023 - 12:36:30 EST
Hi,
On Tue, Mar 14, 2023 at 6:29 AM Mark Brown <broonie@xxxxxxxxxx> wrote:
>
> On Mon, Mar 13, 2023 at 12:49:37PM -0700, Doug Anderson wrote:
> > On Mon, Mar 13, 2023 at 12:05 PM Mark Brown <broonie@xxxxxxxxxx> wrote:
>
> > > This is going to be true for all regulators...
>
> > Sure, that's true. So what are you suggesting?
>
> That we shouldn't be making this change for just one driver, if you can
> write an identical commit log for most if not all drivers but are just
> making the change for one random driver then that suggests something is
> being missed somewhere. Either there's something special about this
> driver or we should do things consistently.
The "special" things are:
* It's been confirmed that this one random driver is involved in
slowing down boot a significant amount.
* The fixed regulator driver is among the simplest of the regulators
and doesn't have the complex interactions that are typically
associated with async probe problems.
For reference, some of the problems I'm aware of that have been seen
in the past when trying to enable async probe more broadly:
a) Apparently, on ACPI child devices aren't guaranteed to be probed
before parent devices once you turn on async probe. This is not
typically the case with device-tree probed devices where children show
up due to of_platform_populate() which is called by a parent device.
This can be handled properly but often isn't.
b) Some drivers try to poke directly at other devices and can get
confused if the other device is probing at the same time. One example
was dual-MIPI on Rockchip [1]. Again, this can be handled properly but
often isn't because nobody tested it.
c) Dynamically allocated ID numbers can change unpredictably from boot
to boot with async probe. This showed up on MMC where eMMC and SD
would change each boot between "mmcblk0" and "mmcblk1".
d) Async probe (obviously) changes timing and that can expose latent
bugs. Almost always those bugs should have been fixed anyway.
e) Async probe tends to stress out other driver's (consumers of the
device that's now probing async) error handling (since they are more
likely to see -EPROBE_DEFER) and can expose latent bugs there.
Let's take an example that I have a tiny bit of familiarity with:
max77686. This is not too strange of a device. It's implemented as a
MFD and has sub-drivers for PMIC (regulator), RTC, and clock.
I think we can be pretty confident that a) above isn't a problem. The
MFD driver (the parent) populates the regulator (the child) with
devm_mfd_add_devices() and it won't do that until the (MFD) parent is
ready. So far so good.
I think we can also be fairly certain that c) isn't a problem, and
probably it's not a problem anywhere for regulators. Nobody (that I'm
aware of) relies on the stability of a dynamically allocated number
for a regulator.
Issues d) and e) could be an issue for max77686, but in almost all
cases where they are we'd want to fix those latent bugs anyway since
they could have hit even without async probe.
...but b) _could_ be a problem. Specifically, today I think that
(unless some of its supplies aren't available at probe time) the PMIC
driver will _always_ finish probing before the RTC/clock driver
because of the order specified in the source code:
{ .name = "max77686-pmic", },
{ .name = "max77686-rtc", },
{ .name = "max77686-clk", },
One would need to do deeper code inspection (and, ideally, testing) to
find out if indeed the RTC driver and clock driver will have problems
if the regulator is not finished probing before their probes start.
Hopefully everything here is fine, but it's the kind of place where
maybe somebody was sloppy because, today, the order is guaranteed. In
general drivers for multi-function devices are more likely to interact
directly with sibling drivers outside of the normal driver framework
and (in my experience) are more likely to be relying on probe order.
If someone wanted to carefully inspect the max77686 code (especially
the interactions between all the MFD sub-drivers) and test enabling
async probe there then it would be a nice improvement. ...switching
over and crossing our fingers might work OK but it also might not.
If we look at the fixed regulator driver and compare, we're in a
simpler state. Sure, we could still expose latent timing bugs or
latent wrong error handling in regulator consumers, but those should
be fixed anyway. The fixed regulator driver doesn't reach directly
into other devices through private APIs and pretty much just relies on
the regulator core for most of its work. The regulator core should be
async-robust because of needing to deal with regulators that show up
as modules.
I guess the last thing I'll say is that PROBE_PREFER_ASYNCHRONOUS was
added specifically so that we could enable this on drivers that were
found to be slow to boot and that were tested to work with async
probe. Without being able to add PROBE_PREFER_ASYNCHRONOUS people were
open-coding solutions per driver to speed probe.
[1] https://lore.kernel.org/r/20221019170255.2.I6b985b0ca372b7e35c6d9ea970b24bcb262d4fc1@changeid