Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezerremoval

From: Alan Stern
Date: Fri Feb 29 2008 - 13:42:29 EST


On Fri, 29 Feb 2008, Rafael J. Wysocki wrote:

> > > Suppose we add a mutex to dev_pm_info, say pm_mtx, and require it to be:
> > > (1) taken by suspend_device(dev) (at the beginning)
> > > (2) released by resume_device(dev) (at the end)
> > > (3) taken (and released) by device_pm_add() if dev is the parent of the device
> > > being added.
> > >
> > > In that case, device_pm_add() will block on attepmpts to register devices whose
> > > parents are suspended (or suspending) and we're done. At least so it would
> > > seem.
> >
> > No; this would repeat the same mistake we were struggling with last
> > week.
>
> Not exactly, because in that case we blocked all attempts to register devices,
> while I think we can only block those regarding the children of a suspending
> (or suspended) device.

It's the "suspending" case that causes trouble. Go back and look at
the race I diagrammed in

https://lists.linux-foundation.org/pipermail/linux-pm/2008-February/016763.html

> > Blocking registration attempts (especially if we start _before_
> > calling the device's suspend method or end _after_ calling the resume
> > method) will lead to deadlocks while suspending or resuming.
>
> I'm not really convinced that it'll happen.

It _did_ happen. Precisely this race occurred in Bug #10030 (the SD
card insertion/removal), although there the window was bigger because
we blocked registrations starting from the start of the system sleep
instead of from when the parent's suspend method was called.

> If we make the rule that
> registering children from the device's own ->suspend() method is forbidden
> (I don't really see why it should be allowed), something like this might work
> (it seems to me):

It's not that the suspend method itself will want to register children.
The problem is that the method has to wait for other threads that may
already have started to register a child. If those other threads are
blocked then suspend will deadlock.

> @@ -427,6 +433,13 @@ static int dpm_suspend(pm_message_t stat
> struct device *dev = to_device(entry);
>
> mutex_unlock(&dpm_list_mtx);
> + mutex_lock(&dev->power.lock);
> + mutex_lock(&dpm_list_mtx);
> + if (dev != to_device(dpm_active.prev)) {
> + mutex_unlock(&dev->power.lock);
> + continue;
> + }
> + mutex_unlock(&dpm_list_mtx);
> error = suspend_device(dev, state);
> mutex_lock(&dpm_list_mtx);
> if (error) {

This looks pretty awkward. Won't it cause lockdep to complain about
recursive locking of dev->power.lock?

> > If the blocking starts after the suspend method returns then it will be
> > safer. But what's the point? If a registration attempt is made at
> > that stage, it means there's a bug in the driver. So failing the
> > registration seems like a reasonable thing to do.
>
> That doesn't buy us anything if drivers don't check whether the registration
> succeeded. And they don't.

It buys us one thing: The system will continue to limp along instead of
locking up.

If drivers don't check whether registration succeeded... What can I
say? It's another bug.

> > One issue: This rule doesn't allow suspend_late or resume_early methods
> > to register any new devices.
>
> Not exactly. Just the children of suspended devices, which makes a difference
> IMO. :-)

During suspend_late and resume_early _every_ device is suspended,
including the fictitious "device-tree-root" device. Hence _every_
registration is for a child of a suspended device.

Besides, you don't want to allow new devices to be registered during
suspend_late, do you? They wouldn't get suspended before the system
went to sleep.

> > Will that cause problems with the CPU hotplug or ACPI subsystems? ACPI in
> > particular may need to freeze the kacpi_notify workqueue -- in fact, that
> > might solve the problem in Bugzilla #9874.
>
> Well, my impression is that we do this thing to prepare for removing the
> freezer in the future, so I'd rather solve issues in some other ways than just
> by freezing threads that get in the way. ;-)

Right now that may be the easiest solution. In fact, it may still be
the easiest solution even after we stop freezing user threads.

> > Remember too that the target state is set before any devices are
> > suspended. Hence, after the state is set there may still be runtime
> > resumes taking place. Those _must_ not fail, which means that this
> > resume ought to work also.
>
> They may be handled in a different way, not by ->resume().

I'm not sure I understand. Sure, autoresume may not involve calling
the driver's resume method. But it does involve actually setting the
device back to full power, so what's the difference?

> > > > + resume_device(dev);
> > > > + mutex_lock(&dpm_list_mtx);
> > > > + if (dev->power.sleeping != PM_GONE)
> > > > + dev->power.sleeping = PM_AWAKE;
> > > > + } else {
> >
> > > Well, I wish it could be simpler ...
> >
> > Me too. But until the API is changed, this seems to be the best we can
> > do. It's not quite as bad as it looks, since a fair amount of the new
> > code is just for reporting on and recovering from bugs in drivers.
>
> Still, it doesn't look very nice ...

Are you still considering adding separate methods for suspend and
hibernate (maybe also for freeze and prethaw)? Perhaps the
"prevent_new_children" and "allow_new_children" methods could be added
then. This would allow some of this complication to go away.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/