Re: [PATCH v3 1/2] driver core: Rework logic in __driver_deferred_probe_check_state to allow EPROBE_DEFER to be returned for longer
From: John Stultz
Date: Tue Feb 18 2020 - 23:23:25 EST
On Tue, Feb 18, 2020 at 6:07 PM Rob Herring <robh@xxxxxxxxxx> wrote:
>
> On Tue, Feb 18, 2020 at 7:11 PM John Stultz <john.stultz@xxxxxxxxxx> wrote:
> >
> > On Tue, Feb 18, 2020 at 4:19 PM John Stultz <john.stultz@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Feb 18, 2020 at 4:06 PM Andy Shevchenko
> > > <andy.shevchenko@xxxxxxxxx> wrote:
> > > > On Wednesday, February 19, 2020, John Stultz <john.stultz@xxxxxxxxxx> wrote:
> > > >> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > > >> index b25bcab2a26b..9d916a7b56a6 100644
> > > >> --- a/drivers/base/dd.c
> > > >> +++ b/drivers/base/dd.c
> > > >> @@ -237,13 +237,12 @@ __setup("deferred_probe_timeout=", deferred_probe_timeout_setup);
> > > >>
> > > >> static int __driver_deferred_probe_check_state(struct device *dev)
> > > >> {
> > > >> - if (!initcalls_done)
> > > >> - return -EPROBE_DEFER;
> > > >
> > > >
> > > > Why to touch this? Can't you simple add a new condition here 'if (deferred_probe_timeout > 0)'... ?
> > >
> > > I think that might work. I'll give it a spin later tonight and double check it.
> > >
> > > The main thing I wanted to do is fix the logic hole in the current
> > > code where after initcalls_done=true but before deferred_probe_timeout
> > > has expired we just fall through and return 0, which results in an
> > > ENODEV being returned from the calling function.
> >
> >
> > So on IRC Bjorn sort of clarified a point I think Rob was trying to
> > make on the earlier iteration of this patch, that it seems like
> > Thierry's patch here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=62a6bc3a1e4f4ee9ae0076fa295f9af1c3725ce3
> > *seems* to be trying to address the exact same issue, and maybe we
> > should just have the genpd code use that instead?
>
> Looking at it some more, I think the change to the pinctrl code there
> breaks the case I care about (pinctrl described in DT and no driver on
> a system that previously worked without pinctrl). Maybe if I set the
> timeout to 0 it will still work (which would be fine).
>
> > The main question though, is why do we need both? As mentioned above,
> > the existing logic in __driver_deferred_probe_check_state() seems
> > wrong: Until late_initcall it returns EPROBE_DEFER, then after
> > initcalls_done==true returns 0 (in which case the caller then
> > translates to ENODEV), until the timeout expires which it then returns
> > ETIMEDOUT.
> >
> > I suspect what is really wanted is EPROBE_DEFER -> (0) ENODEV (when
> > timeout is not set) or EPROBE_DEFER -> ETIMEOUT (when the timeout is
> > set), instead of the two state transitions it currently makes.
>
> Yes. There's never any reason to return 0. It should be one of 3
> errnos. If we're moving to always having a timeout, then maybe ENODEV
> isn't even needed. I guess it's a stronger "we're done with init and
> there's never going to be another driver" which maybe we should do for
> !CONFIG_MODULES.
>
> > So I still think my patch is needed, but I also suspect a better fix
> > would be to kill driver_deferred_probe_check_state() and just replace
> > its usage with driver_deferred_probe_check_state_continue(). Or am I
> > still missing something?
>
> I think those should be merged. They now do almost the same thing.
> Only in the timeout==-1 case do they differ.
Well.. almost.. in addition to that different after late_initcall but
before the timeout driver_deferred_probe_check_state() will return
ENODEV, where as driver_deferred_probe_check_state_continue() returns
EPROBE_DEFER until the timeout happens.
> The original intent was that driver_deferred_probe_check_state()
> simply returned what state we're in and the caller would decide what
> to do with that. IOW, each caller could implement their own policy
> possibly based on other information. Pinctrl factored in a DT hint.
> IOMMU relied on everything was built-in.
I'm not super opinionated here, having the subsystem opt in and decide
what to do with the check_state() return value seems sane. But I
suspect that we can consolidate the two cases down and simplify some
of the logic here.
I'll take another spin on this.
thanks
-john