Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe

From: Rafael J. Wysocki
Date: Tue Jun 30 2020 - 09:51:13 EST


On Fri, Jun 26, 2020 at 10:53 PM Geert Uytterhoeven
<geert@xxxxxxxxxxxxxx> wrote:
>
> Hi Saravana,
>
> On Fri, Jun 26, 2020 at 10:34 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > On Fri, Jun 26, 2020 at 4:27 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> > > On Thu, Jun 25, 2020 at 7:52 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > > > On Thu, Jun 25, 2020 at 10:47 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> > > > > Note that deferred probing gets in the way here and so the problem is
> > > > > related to it.
> > > >
> > > > I mean, we officially support deferred probing. Shouldn't we fix it so
> > > > that it doesn't break suspend/resume?
> > >
> > > Yes, we should fix deferred probing.
>
> Please take into account that breakage is an actual regression.
>
> > > > Also, it's pretty easy to have
> > > > cases where one module probes multiple device instances and loading it
> > > > in one order would break dpm_list order for one device and loading it
> > > > in another order would break it for another device. And there would be
> > > > no "proper" order to load modules (because module order != device
> > > > order).
> > >
> > > I'm not saying that the current code is perfect. I'm saying that the
> > > fix as proposed adds too much cost for everybody who may not care IMO.
> >
> > Ok, how about I don't do this reordering until we see the first
> > deferred probe request? Will that work for you? In that case, systems
> > with no deferred probing will not incur any reordering cost. Or if
> > reordering starts only towards the end, all the previous probes won't
> > incur reordering cost.
>
> That first deferred probe request is more or less as of the first probe,
> since commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing when
> adding all top level devices"), at least on DT systems.

The deferred probe reordering of devices to the end of dpm_list
started in 2012, so it is nothing new, and it demonstrably works for
devices where the dependencies are known to the driver core.

That said, in the cases when the dependencies are known to the driver
core, it is also unnecessary to reorder dpm_list in
deferred_probe_work_func(), because the right ordering of it is going
to be determined elsewhere.

Also commit 494fd7b7ad10 ("PM / core: fix deferred probe breaking
suspend resume order") is not the source of the problem here, because
the problem would have still been there without it, due to the
device_pm_move_last() that was there before, so the Fixes: tag
pointing to that commit is misleading.

Now, because 716a7a259690 ("driver core: fw_devlink: Add support for
batching fwnode parsing") is an optimization and the regression is
present because of it AFAICS, the best way to address it at that point
would be to revert commit 716a7a259690 for 5.8 and maybe do the
optimization more carefully.

Greg, what do you think?