Re: [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0

From: Thierry Reding
Date: Fri Aug 07 2020 - 08:26:23 EST


On Fri, Aug 07, 2020 at 01:02:44PM +0200, Thierry Reding wrote:
> On Thu, Aug 06, 2020 at 07:09:16PM -0700, John Stultz wrote:
> > On Thu, Aug 6, 2020 at 6:52 AM Thierry Reding <thierry.reding@xxxxxxxxx> wrote:
> > >
> > > On Wed, Apr 22, 2020 at 08:32:43PM +0000, John Stultz wrote:
> > > > This patch addresses a regression in 5.7-rc1+
> > > >
> > > > In commit c8c43cee29f6 ("driver core: Fix
> > > > driver_deferred_probe_check_state() logic"), we both cleaned up
> > > > the logic and also set the default driver_deferred_probe_timeout
> > > > value to 30 seconds to allow for drivers that are missing
> > > > dependencies to have some time so that the dependency may be
> > > > loaded from userland after initcalls_done is set.
> > > >
> > > > However, Yoshihiro Shimoda reported that on his device that
> > > > expects to have unmet dependencies (due to "optional links" in
> > > > its devicetree), was failing to mount the NFS root.
> > > >
> > > > In digging further, it seemed the problem was that while the
> > > > device properly probes after waiting 30 seconds for any missing
> > > > modules to load, the ip_auto_config() had already failed,
> > > > resulting in NFS to fail. This was due to ip_auto_config()
> > > > calling wait_for_device_probe() which doesn't wait for the
> > > > driver_deferred_probe_timeout to fire.
> > > >
> > > > Fixing that issue is possible, but could also introduce 30
> > > > second delays in bootups for users who don't have any
> > > > missing dependencies, which is not ideal.
> > > >
> > > > So I think the best solution to avoid any regressions is to
> > > > revert back to a default timeout value of zero, and allow
> > > > systems that need to utilize the timeout in order for userland
> > > > to load any modules that supply misisng dependencies in the dts
> > > > to specify the timeout length via the exiting documented boot
> > > > argument.
> > > >
> > > > Thanks to Geert for chasing down that ip_auto_config was why NFS
> > > > was failing in this case!
> > > >
> > > > Cc: "David S. Miller" <davem@xxxxxxxxxxxxx>
> > > > Cc: Alexey Kuznetsov <kuznet@xxxxxxxxxxxxx>
> > > > Cc: Hideaki YOSHIFUJI <yoshfuji@xxxxxxxxxxxxxx>
> > > > Cc: Jakub Kicinski <kuba@xxxxxxxxxx>
> > > > Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> > > > Cc: Rafael J. Wysocki <rjw@xxxxxxxxxxxxx>
> > > > Cc: Rob Herring <robh@xxxxxxxxxx>
> > > > Cc: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx>
> > > > Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@xxxxxxxxxxx>
> > > > Cc: Robin Murphy <robin.murphy@xxxxxxx>
> > > > Cc: Andy Shevchenko <andy.shevchenko@xxxxxxxxx>
> > > > Cc: Sudeep Holla <sudeep.holla@xxxxxxx>
> > > > Cc: Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx>
> > > > Cc: Naresh Kamboju <naresh.kamboju@xxxxxxxxxx>
> > > > Cc: Basil Eljuse <Basil.Eljuse@xxxxxxx>
> > > > Cc: Ferry Toth <fntoth@xxxxxxxxx>
> > > > Cc: Arnd Bergmann <arnd@xxxxxxxx>
> > > > Cc: Anders Roxell <anders.roxell@xxxxxxxxxx>
> > > > Cc: netdev <netdev@xxxxxxxxxxxxxxx>
> > > > Cc: linux-pm@xxxxxxxxxxxxxxx
> > > > Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@xxxxxxxxxxx>
> > > > Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@xxxxxxxxxxx>
> > > > Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> > > > Signed-off-by: John Stultz <john.stultz@xxxxxxxxxx>
> > > > ---
> > > > drivers/base/dd.c | 13 ++-----------
> > > > 1 file changed, 2 insertions(+), 11 deletions(-)
> > >
> > > Sorry for being a bit late to the party, but this breaks suspend/resume
> > > support on various Tegra devices. I've only noticed now because, well,
> > > suspend/resume have been broken for other reasons for a little while and
> > > it's taken us a bit to resolve those issues.
> > >
> > > But now that those other issues have been fixed, I've started seeing an
> > > issue where after resume from suspend some of the I2C controllers are no
> > > longer working. The reason for this is that they share pins with DP AUX
> > > controllers via the pinctrl framework. The DP AUX driver registers as
> > > part of the DRM/KMS driver, which usually happens in userspace. Since
> > > the deferred probe timeout was set to 0 by default this no longer works
> > > because no pinctrl states are assigned to the I2C controller and
> > > therefore upon resume the pins cannot be configured for I2C operation.
> >
> > Oof. My apologies!
> >
> > > I'm also somewhat confused by this patch and a few before because they
> > > claim that they restore previous default behaviour, but that's just not
> > > true. Originally when this timeout was introduced it was -1, which meant
> > > that there was no timeout at all and hence users had to opt-in if they
> > > wanted to use a deferred probe timeout.
> >
> > I don't think that's quite true, since the point of my original
> > changes were to avoid troubles I was seeing with drivers not loading
> > because once the timeout fired after init, driver loading would fail
> > with ENODEV instead of returning EPROBE_DEFER. The logic that existed
> > was buggy so the timeout handling didn't really work (changing the
> > boot argument wouldn't help, because after init the logic would return
> > ENODEV before it checked the timeout value).
> >
> > That said, looking at it now, I do realize the
> > driver_deferred_probe_check_state_continue() logic in effect never
> > returned ETIMEDOUT before was consolidated in the earlier changes, and
> > now we've backed the default timeout to 0, old user (see bec6c0ecb243)
> > will now get ETIMEDOUT where they wouldn't before.
> >
> > So would the following fix it up for you? (sorry its whitespace corrupted)
> >
> > diff --git a/drivers/pinctrl/devicetree.c b/drivers/pinctrl/devicetree.c
> > index c6fe7d64c913..c7448be64d07 100644
> > --- a/drivers/pinctrl/devicetree.c
> > +++ b/drivers/pinctrl/devicetree.c
> > @@ -129,9 +129,8 @@ static int dt_to_map_one_config(struct pinctrl *p,
> > if (!np_pctldev || of_node_is_root(np_pctldev)) {
> > of_node_put(np_pctldev);
> > ret = driver_deferred_probe_check_state(p->dev);
> > - /* keep deferring if modules are enabled
> > unless we've timed out */
> > - if (IS_ENABLED(CONFIG_MODULES) && !allow_default &&
> > - (ret == -ENODEV))
> > + /* keep deferring if modules are enabled */
> > + if (IS_ENABLED(CONFIG_MODULES) &&
> > !allow_default && ret < 0)
> > ret = -EPROBE_DEFER;
> > return ret;
> > }
> >
> > (you could probably argue calling driver_deferred_probe_check_state
> > checking ret at all is silly here, since EPROBE_DEFER is the only
> > option you want)
>
> Just by looking at it I would've said, yes, this looks like it would fix
> it. However, upon giving this a try, I see the same pinmux issues as I
> was seeing before. I'm going to have to dig a little deeper to see if I
> can find out why that is, because it isn't obvious to me right away.

Oh, nevermind. I managed to somehow mess up the device tree. With that
fixed the above also restores the pinmux attachment for the I2C
controller and suspend/resume works fine.

Thierry

Attachment: signature.asc
Description: PGP signature