Re: [PATCH] driver core: Ensure proper suspend/resume ordering

From: Rafael J. Wysocki
Date: Fri Sep 11 2015 - 18:10:30 EST


On Friday, September 11, 2015 02:03:46 PM Thierry Reding wrote:
> On Fri, Sep 11, 2015 at 12:08:02AM +0200, Rafael J. Wysocki wrote:
> > On Thursday, September 10, 2015 12:19:03 PM Thierry Reding wrote:
> > > From: Thierry Reding <treding@xxxxxxxxxx>
> > >=20
> > > Deferred probe can lead to strange situations where a device that is a
> > > dependency for others will be moved to the end of the dpm_list. At the
> > > same time the dependers may not be moved because at the time they will
> > > be probed the dependee may already have been successfully reprobed and
> > > they will not have to defer the probe themselves.
> >=20
> > So there's a bug in the implementation of deferred probing IMO.
>
> Well, yeah. The root problem here is that we don't have dependency
> information and deferred probing is supposed to fix that. It does so
> fairly well, but it breaks in this particular case.
>
> > > One example where this happens is the Jetson TK1 board (Tegra124). The
> > > gpio-keys driver exposes the power key of the board as an input device
> > > that can also be used as a wakeup source. Commit 17cdddf0fb68 ("ARM:
> > > tegra: Add gpio-ranges property") results in the gpio-tegra driver
> > > deferring probe because one of its dependencies, the pinctrl-tegra
> > > driver, has not successfully completed probing. Currently the deferred
> > > probe code will move the corresponding gpio-tegra device to the end of
> > > the dpm_list, but by the time the gpio-keys device, depending on the
> > > gpio-tegra device, is probed, gpio-tegra has already been reprobed, so
> > > the gpio-keys device is not moved to the end of dpm_list itself.

At this point, when checking its dependencies, gpio-keys should also check
if their ordering with respect to it in dpm_list is correct and move itself
to the end of it otherwise.

There might be a helper for that I suppose.

> > > As a
> > > result, the suspend ordering becomes pinctrl-tegra -> gpio-keys ->
> > > gpio-tegra. That's problematic because the gpio-keys driver requests
> > > the power key to be a wakeup source. However, the programming of the
> > > wakeup interrupt registers happens in the gpio-tegra driver's suspend
> > > callback, which is now called before that of the gpio-keys driver. The
> > > result is that the wrong values are programmed and leaves the system
> > > unable to be resumed using the power key.
> > >=20
> > > To fix this situation, always move devices to the end of the dpm_list
> > > before probing them. Technically this should only be done for devices
> > > that have been successfully probed, but that won't work for recursive
> > > probing of devices (think an I2C master that instantiates children in
> > > its ->probe()). Effectively the dpm_list will end up ordered the same
> > > way that devices were probed, hence taking care of dependencies.
> > >=20
> > > Signed-off-by: Thierry Reding <treding@xxxxxxxxxx>
> > > ---
> > > Note that this commit is kind of the PM equivalent of 52cdbdd49853
> > > ("driver core: correct device's shutdown order) and that we have two
> > > lists that are essentially the same (dpm_list and devices_kset). I'm
> > > wondering if it would be worth looking into getting rid of one of
> > > them? I don't see any reason why the ordering for shutdown and
> > > suspend/resume should be different, and having a single list would
> > > help keep this in sync.
> >=20
> > We move away things from dpm_list during suspend and add them back to it
> > during resume to handle the situations in which some devices go away or
> > appear during suspend/resume. That makes this idea potentially problemat=
> ic.
>
> Okay, I see. If they are used for different purposes it's fine to keep
> them both.
>
> > > drivers/base/dd.c | 33 +++++++++++++++++++++++----------
> > > 1 file changed, 23 insertions(+), 10 deletions(-)
> > >=20
> > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > > index be0eb4639128..56291b11049b 100644
> > > --- a/drivers/base/dd.c
> > > +++ b/drivers/base/dd.c
> > > @@ -88,16 +88,6 @@ static void deferred_probe_work_func(struct work_str=
> uct *work)
> > > */
> > > mutex_unlock(&deferred_probe_mutex);
> > > =20
> > > - /*
> > > - * Force the device to the end of the dpm_list since
> > > - * the PM code assumes that the order we add things to
> > > - * the list is a good order for suspend but deferred
> > > - * probe makes that very unsafe.
> > > - */
> > > - device_pm_lock();
> > > - device_pm_move_last(dev);
> > > - device_pm_unlock();
> > > -
> > > dev_dbg(dev, "Retrying from deferred list\n");
> > > bus_probe_device(dev);
> > > =20
> > > @@ -312,6 +302,29 @@ static int really_probe(struct device *dev, struct=
> device_driver *drv)
> > > */
> > > devices_kset_move_last(dev);
> > > =20
> > > + /*
> > > + * Force the device to the end of the dpm_list since the PM code
> > > + * assumes that the order we add things to the list is a good order
> > > + * for suspend but deferred probe makes that very unsafe.
> > > + *
> > > + * Deferred probe can also cause situations in which a device that is
> > > + * a dependency for others gets moved further down the dpm_list as a
> > > + * result of probe deferral. In that case the dependee will end up
> > > + * getting suspended before any of its dependers.
> > > + *
> > > + * To ensure proper ordering of suspend/resume, move every device that
> > > + * is being probed to the end of the dpm_list. Note that technically
> > > + * only successfully probed devices need to be moved, but that breaks
> > > + * for recursively added devices because they would end up in the list
> > > + * in reverse of the desired order, so we simply do it unconditionally
> > > + * for all devices before they are being probed. In the worst case the
> > > + * list will be reordered a couple more times than necessary, which
> > > + * should be an insignificant amount of work.
> > > + */
> > > + device_pm_lock();
> > > + device_pm_move_last(dev);
> > > + device_pm_unlock();
> >=20
> > So I don't agree with doing that for every driver being probed against the
> > same device. That's just wasteful IMO.
>
> I don't understand. At this point driver matching has already taken
> place, so this will only every happen for one particular driver and the
> corresponding device. In the most common case this will happen exactly
> once, when the device is probed. Worst case it will happen a second or
> more times if a driver defers probing for a specific device. But in that
> case moving the device to the end of the list is absolutely required to
> keep it properly ordered for suspend/resume. Note that this is already
> done by the original code in deferred_probe_work_func() that is removed
> in the first hunk. The fix here is to do it for every device to ensure
> that inter-device dependencies are properly dealt with.
>
> I agree that it's a little wasteful, but that's completely in line with
> deferred probe. It's a simple solution to a very difficult problem, so
> naturally it comes at a cost. But it's also fairly elegant in how it
> correctly solves the ordering problem with very little code. The only
> other way that I can think of to avoid reordering the list for every
> device probe would be to sort it in advance using dependency information
> which we don't have. So we'd need to first add all that dependency
> information, and using that information is likely to be more work than
> simply reordering the list for each probe.

But that is problematic too as Alan pointed it out.

I'm always cautious about changes that make the core do something for every
device/driver, because they are very likely to step on corner cases that we
don't need to worry about otherwise.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/