Re: [RFC][PATCH] driver core: Extend returning EPROBE_DEFER for two minutes after late_initcall

From: Bjorn Andersson
Date: Fri Feb 14 2020 - 00:15:53 EST


On Thu 13 Feb 20:05 PST 2020, John Stultz wrote:

> On Thu, Feb 13, 2020 at 6:19 PM Bjorn Andersson
> <bjorn.andersson@xxxxxxxxxx> wrote:
> > The purpose of 25b4e70dcce9 ("driver core: allow stopping deferred probe
> > after init") is to ensure that when the kernel boots with a DeviceTree
> > blob that references a resource (power-domain in this case) that either
> > hasn't been compiled in, or simply doesn't exist yet, it should continue
> > to boot - under the assumption that these resources probably aren't
> > needed to provide a functional system.
> >
> > I don't think your patch maintains this behavior, because when userspace
> > kicks in and load kernel modules during the first two minutes they will
> > all end up in the probe deferral list. Past two minutes any event that
> > registers a new driver (i.e. manual intervention) will kick of a new
> > wave of probing, which will now continue as expected, ignoring any
> > power-domains that is yet to be probed (either because they don't exist
> > or they are further down the probe deferral list).
>
> Hmm. I'll have to look at that again. I worry the logic is overloaded
> a bit, because the logic in __driver_deferred_probe_check_state() will
> only return -EPROBE_DEFER before late_initcall otherwise it returns
> -ETIMEDOUT or 0. So if we call__genpd_dev_pm_attach() after
> late_initcall and the pd isn't ready, the driver probe will fail
> permanently and not function.
>

Correct. And the motivation for this is that if you use a dtb from the
future it might describe a power-domain provider that is not yet
implemented in the booting kernel and as such the purpose is to fail
fast - in a way that drivers can ignore, rather than probe deferring
indefinitely.

> I'd think in the case you describe (correct me if I'm misunderstanding
> you), modules that load in the first two minutes would hit
> EPROBE_DEFER only if a dependency is missing, and will continue to try
> to probe next round. But once the two minutes are up, they will catch
> ETIMEDOUT and fail permanently.
>

This extends the time that probe deferral is functional from
late_initcall to 2 minutes from boot, which should solve all practical
problems you and I have with the current situation.

But the specific detail that your patch is missing is that drivers that
probe defer will end up on the deferral list and this list is only
processed whenever drivers are added or some driver succeeds to probe.
So before the 2 minutes the deferral dance will stop and you need one of
these events to kick off the dance again.

> > You can improve the situation somewhat by calling
> > driver_deferred_probe_trigger() in your
> > deferred_initcall_done_work_func(), to remove the need for human
> > intervention. But the outcome will still depend on the order in
> > deferred_probe_active_list.
>
> Ok. I'll take a look at that.
>

Cool

> Thanks so much for the feedback!

Thank you for working on this, I've spent days debugging subtle issues
because of this feature...

Regards,
Bjorn